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NorBERT WIENER 


Norbert Wiener was born in Columbia, Mo., on 
November 26, 1894. He received the A.B. degree 
from Tufts College in 1909, and did graduate work 
at Cornell and Harvard Universities, receiving from 
the latter the M.A. degree in 1912 and the Ph.D. 
degree in 1913. The following year he was awarded 
the John Thornton Kirkland Fellowship at Harvard 
which enabled him to study with Bertrand Russell 
in Cambridge, England and with Hilbert in Got- 
tingen, Germany. He continued his studies with 
Russell and Hardy in England as Frederick Sheldon 
Fellow in 1915 and concluded the year with work 
at Columbia University. 

After returning to this country, Dr. Wiener held 
various academic and commercial positions. He 
was Docent Lecturer in Harvard’s Department of 
Philosophy for one year, and thereafter an instruc- 
tor in Mathematics at the University of Maine. 
During 1917-1918 he worked for General Electric 
Corporation at Lynn, Mass. Before entering the 
army for a tour of duty at Aberdeen Proving 
Ground in Maryland, he was a staff writer for En- 
cyclopedia Americana. In 1919 he worked briefly 
for the Boston Herald. 

Since 1919, Dr. Wiener has been a faculty mem- 
ber of the Mathematics Department at the Mas- 
sachusetts Institute of Technology, becoming a full 


Professor in 1932. During this time he has lectured 
extensively in the United States and abroad. In 
1926 he was in Gottingen and Copenhagen under a 
Guggenheim Fellowship, in 1929 he became Ex- 
change Professor of Mathematics at Brown Univer- 
sity, and in 1931 spent one year as a lecturer at 
Cambridge University. His final position abroad 
before World War II was that of Visiting Professor 
at Tsing Hua University in Peiping, China. 


In 1947, Dr. Wiener collaborated with Dr. Arturo 
Rosenblueth at the National Institute of Cardiology 
in Mexico. He lectured in the College de France of 
the University of Paris as a Fulbright Teaching 
Fellow in 1951, and during 1955-1956 was a visiting 
Professor at the Indian Statistical Institute in 
Calcutta. 


Numerous prizes have been awarded to Dr. 
Wiener during his notable career, among them the 
Bowdoin Prize from Harvard in 1914, the Bocher 
Prize of the American Mathematical Society in 
1933, and the Lord and Taylor American Design 
Award in 1949. Tufts College and the University of 
Mexico awarded him honorary Sc.D. degrees in 
1946 and 1951 respectively. 


He is a member of the American Mathematical 
Society and the London Mathematical Society. 
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CZ 


What is Information Theory? 


NORBERT WIENER 


NFORMATION THEORY has been identified 
in the public mind to denote the theory of infor- 
mation by bits, as developed by Claude E. 
Shannon and myself. This notion is certainly impor- 
tant and has proved profitable as a standpoint at 
least, although as Dr. Shannon suggests in his edi- 
torial, ““The Bandwagon,” the concept as taken from 
this point of view is beginning to suffer from the 
indiscriminate way in which it has been taken as a 
solution of all informational problems, a sort of 
magic key. I am pleading in this editorial that Infor- 
mation Theory go back of its slogans and return to the 
point of view from which it originated: that of the 
general statistical concept of communication. A mes- 
sage is to be conceived as a sequence of occurrences 
distributed in time to be considered not exclusively 
by itself, but as one of an ensemble of similar se- 
quences. As such it comes under the theory of time 
series which is an important branch of statistical 
theory with a rapidly developing technique and set 
of concepts of its own. This theory is closely allied 
to the ideas of Willard Gibbs in statistical mechanics. 
What I am urging is a return to the concepts of this 
theory in its entirety rather than the exaltation of 
one particular concept of this group, the concept of 
the measure of information into the single dominant 
idea of all. 
I am pleading for this more particularly because 
the Gibbsian point of view is showing an applicability 
and fertility in many branches of science other than 


communication theory and in my opinion in all 
branches of science whatever. It is generally recog- 
nized that the quantum theory which now dominates 
the whole of physics is at root a statistical theory; 
although it is perhaps not yet as generally recognized 
as it should be, the quantum theory is strictly a 


June 


branch of the theory of time series. Professor Armand ~ 


Siegel and I are among those now working in this field. 

What I am here entreating is that communication 
theory be studied as one item in an entire context of 
related theories of a statistical nature, and that it 
should not lose its integrity by becoming a special 
vested interest attached to a certain set of slogans 
and clichés. I hope that these TRANSACTIONS may 
encourage this integrated view of communication 
theory by extending its hospitality to papers which, 
while they bear on communication theory, cross its 
boundaries, and have a scope covering the related 
statistical theories. In my opinion we are in a dan- 
gerous age of overspecialization. To me the danger of 
this period is not primarily that we are studying 
very special problems that the development of science 
has forced us to go into, but rather that we’are in 
great danger of finding our outlook so limited that 
we may fail to see the bearing of important ideas 
because they have been formulated in what our 
organization of science has decreed to be alien terri- 
tory. I hope that these TRANSACTIONS may steadily 
set their face against this comminution of the 
intellect. 


Za 


956 


IRE TRANSACTIONS ON INFORMATION THEORY 


49 


Optimum, Linear, Discrete F iltering of Signals 
Contaimmg a Nonrandom Component’ 


KENT R. JOHNSONt 


Summary—The problem of filtering nonrandom signals from 
tationary random noise has recently received considerable atten- 
ion. The filter design procedure developed by Wiener is not 
pplicable in this case since that procedure is predicted on the 
ssumption that the signal to be filtered is stationary and random. 
vately, both Booton and the team of Zadeh and Ragazzini have 
veloped optimum filters for the smoothing of nonrandom signals; 
owever, both of these filters are of the continuous type, whereas 
4 many applications in which discontinuous control is used there 
s need for discrete filters for such signals. This paper presents 
quations governing the design of a discrete version of the Zadeh- 
kagazzini filter. The input signal is assumed to be the sum of a 
.onrandom polynomial and a stationary random component and is 
ssumed to be obscured by stationary random noise. 

An approximate formula for the output noise power of an optimum 
Iter designed to make a zero-lag estimate of either its input 
unction or one of the derivatives thereof is derived for the important 
pecial case in which the noise is white and the signal is a non- 
andom polynomial. A brief discussion is given of the use of the 
ilter with nonrandom, nonpolynomial signals. 


INTRODUCTION 


N THE PAST few years there has been considerable 
interest-in the design of filters for the separation of 
nonrandom signals from noise. Since a wide class of 

unctions can be approximated by polynomials it is often 
onvenient to assume that the nonrandom signal to be 
eceived is such a function. A suitable degree for the 
olynomial, and the time interval over which the approxi- 
nation will be valid, can be estimated from a priori 
nowledge of the quantity to be measured. 

Recently Booton’ has presented equations governing 
he design of optimum, time varying, continuous, linear 
ters which may be used with signals containing non- 
andom as well as random components. Using his pro- 
edure it is possible to design a filter to yield an optimum 
stimate of the result of any desired time varying linear 
peration on the input. Unfortunately Booton’s article 
loes not contain a general solution to the equations 
eveloped and does not give any estimate of the filter’s 
utput noise power. 

Zadeh and Ragazzini’ have developed the optimum, 
ontinuous, time invariant, linear filter, having specified 
1emory time, for use with signals consisting of both a 
onrandom polynomial and a stationary random com- 
onent. They assume the signal to be obscured by station- 
ry random noise. Their filter, like Booton’s, differs 


* The work in this paper was done at the Ramo-Woolridge Corp. 
1 Los Angeles, Calif. 

+ School of Engineering, University of Pittsburgh, Pittsburgh, Pa. 

1R. C. Booton, Jr., “An optimization theory for time-varying 
near systems with nonstationary statistical inputs,” Proc. IRE, 
ol. 40, pp. 977-981; August, 1952. ; 7 

2. A. Zadeh and J. R. Ragazzini, ‘An extension of Wiener’s 
ieory of proediction,” J. Appl. Phys., vol. 21, pp. 645-655; July, 
)50. 


from the Wiener filter* in that that filter is designed to 
operate on a stationary random signal only. The Zadeh- 
Ragazzini filter reduces to a Wiener filter in the case when 
the nonrandom part of the input is zero and the memory 
time is infinite. 

Both the Booton filter and the Zadeh-Ragazzini filter 
are continuous. Because, however, the use of discontinuous 
control is indicated in many applications (to achieve 
increased accuracy, for example), the development of 
discrete filters for use with nonrandom signals seems 
worthwhile. Accordingly this article is devoted to the 
development and preliminary investigation of a discrete 
analog of the Zadeh-Ragazzini filter. A set of linear 
equations will be derived whose solution is the set of 
weighting coefficients to be used in the filter. In addition, 
some approximate formulas will be developed which give 
the ratio of output to input noise power for the case of 
optimum (least square sense), linear, zero-lag, unbiased, 
discrete filters having specified memory time and yielding 
as output an estimate of either the input signal or a 
derivative thereof. The formulas are valid for the special 
case in which the input is a polynomial and the obscuring 
noise is white.* The term “zero-lag’” indicates that the 
filter estimates the value of the quantity of interest at 
the time of the latest sample. 


DERIVATION OF FitTER EQUATIONS 


Consider an ensemble of signals, each consisting of a 
polynomial and a stationary random component. Let 
each signal be obscured by stationary random noise. It 
should be understood that the polynomial, denoted by 
P(t), is not random; 7.e., it remains the same over the 
entire ensemble. The polynomial’s degree is known, but 
its coefficients are not. The random part of the signal, 
M(t), and the random noise, N(¢), do, of course, vary over 
the ensemble. It will be assumed for convenience that 
both M(t) and N(t) have zero ensemble mean. The 
problem is to determine what linear combination of a 
fixed number of equally spaced samples of the input, 
(7.e., signal plus noise), will give the optimum, in the 
sense of the least ensemble mean squared error, zero-bias 
estimate of the desired output function. 

The entire signal may be written as 


S@ = P@) + MM) (1) 


3 Norbert Wiener, “Extrapolation, Interpolation, and Smoothing 
of Stationary Time Series,” John Wiley and Sons, Inc., New York, 
N.Y.; 1950. 

4That is, it is assumed that the correlation time of the noise is 
small compared to the interval at which the input is sampled. 
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and the input to the filter as 
e(t) = S(t) + N(O). (2) 


The output of the filter is then 


CAC S We (t — nh) (3) 


n=0 
where the w, are constants yet to be determined. It is 


assumed that the sampling interval, denoted by h, 1s 
constant. Let the desired filter output be 


S*(t) = AS(2) (4) 


where A is an operator which may be represented as a 
linear combination of derivatives. Define the output 
error, ¢, by 

e = S*(t) — e,(2) (5) 
and stipulate that 


ee N 


AS(t) — e(t) = AP() — me w,P(t — nh) = 0, (6) 


a 


so that the estimate of S*(¢) is unbiased. An unbiased 
estimate is required because it seems desirable to have 
zero output error when there is no input noise. This 
seems a reasonable requirement since in application the 
input noise level may vary considerably. The bar in (6) 
indicates an ensemble average. In obtaining this equation 
the zero mean requirement on M(t) and N(¢) has been 
used. 
The polynomial P(t — nh) may be written as 


Pe phy » Ca pO, (7) 


where P‘’(t) is the jth derivative of P(t) and k is the 
degree of the polynomial. Substituting (7) into (6) yields 


.< (—nh)’ Gi) - (—h)'n,; Gi) 
ALO a Oe eee ete) 
j=0 n=0 j: 7=0 J: 
in which 
N . 
uj, = do wan’ (9) 


is the jth moment of the weighting coefficients w,. If 
AP(t) is now written as a Taylor’s Series, the two sides 
of (8) may be compared term by term to yield k + 1 
constraints, in the form of the required values of the 
moments y;, on the weighting coefficients, w,. Thus, if A 
is the identity operator 


Mo = 1 
wu =O gO. 
If A is the first differential operator, 
1 
mF 


By using (6) for AP(¢) it is now possible to write the 


output noise, €, as 
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e= AM(t) — > w,[M(t — nh) + N(t — nh)]. 


n=0 


(1C 


The ensemble average of the noise power then becomes 


where 
van) = [AM(O)|M(t — nh) (12 
vuln, i) = M(t — nh)M(t — th), (12 
and 
vw(n, i) = N(t — nh)N(t — Ah). (14 


In obtaining (11), W() and N(é) have been assume 
uncorrelated. = 

It is now desired to minimize € subject to the k + 
constraints expressed by (8). This may be done by tk 
usual method of Lagrange multipliers. Consider th 
expression 


& k N 
I = &€(w,) — Ds Np ye teen. (1 


n=0 
The A; are constants. In order for a set of w, to minimiz 
e’ subject to the constraints previously mentioned, it 
necessary that there exist a set of A; such that the quantit 
I is rendered stationary by the aforementioned set of w) 
That is, it is necessary to find a set of w, and a set of } 
such that all the equations 


ol 


and 


are satisfied. These are N + k + 2 equationsin N + k + 
unknowns. In the case at hand they are linear in the 2 
and the \;. Eqs. (16) may be written as 


N k 
2 3s wn, 1) — Da, jn’ = 2.) n=0---N Ga 
and 
N ; 
DS wyni = p; 4=0-:--k, 
where 
Vn, 1) = Vuln, ) te Yr n, 1). (1: 


| 
Hqs. (11) and (17a) may be combined to yield an ce) 


pression for the minimum value of «, denoted by e 
Thus, 


cn = [(AMOFP — Y wv) +5 Ome 
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qs. (17) may be written in matrix form as 
VON OM) Foose ¥(0,N) 11 0 0 0 ]f wo ¥4(0) 
ei at 1 Wy va(1) 
Udi, 22 eee a a dass v,(2) 
VND) Wa ean WN Niel aN NE NU || x va(N) 
rea Ae 5 ae | ba ee (20a) 
Xo 
Tesi ht 1 ! ot lo 
Oma? N 
Oui 2k Nie 
0 
| 
| 
oN 
0 1 Be va al o ME 
i vyv(0, 0) Bots Yv(0, N) 
Be Ale 2 B (20b) Uy =: (28) 
B’ 0 R Q Yw(N, 0) gies Yw(NV, N) 
there the submatrices in (20b) are as shown above. The and 
rime on B indicates its transpose. With the matrix v 
otation the error ¢,, becomes Die a “with pi ORO) (29) 


én = [AM — J(BR + P), (21) 


1 which each side of the equation is a matrix with one 
lement. Eq. (20b) may be inverted to yield 


eT] = 
R ee LANE 
‘here 
B= — vo BBW 'B) BY (23) 
bie Bib YB) (24) 
N =—(B'V"B)~. (25) 
‘hus the weighting coefficients are given by 
J = bP FO. (26) 


‘he actual matrix inversions indicated in (28), (24), and 
25) are in general quite tedious. 

If the signal consists of a polynomial alone: 7.e., M = 0, 
22) through (25) may be combined with (21) to yield a 
ompact expression for the ratio ¢, to p, the total input 
olse power. 


2 


Em a Q'(B'Yn Hah Q a OMB’ DBO: 
Pp Pp 


(27) 


[ere 


The matrix D depends only on the shape of the correlation 
function Wy(z, ); it is independent of the total noise 
power p. Thus (27) indicates that when the signal is a 
polynomial the ratio €,/p is independent of input noise 
power; 7.e., the output noise power is a linear function of 
input noise power. 

Because the number of equations, (V + k& + 2), which 
determine the weighting coefficients is large, the use of a 
digital computer may well be indicated for their solution. 
It is possible, however, for an interesting special case, to 
obtain an approximate formula for the noise power 
output of the filter without actually determining the 
weighting coefficients. This formula is presented below. 

Sarg nee [eee oy i 
p> IND ON Saar) ip ned <2 yctae 

The derivation of (30) has been relegated to an Appendix 
because of its length. There it is shown that the above 
result is valid under the following conditions. 


(30) 


1) The signal is a pure polynomial; z.e., W(t) = 0. 

2) The correlation time of the noise is small compared 
to the sampling interval. 

3) The number of samples used in the filter is large 
compared to the degree of the polynomial on which 
the filter is designed to operate, (V >> k). 


= 


Notice that in (30) the product AN is the filter memory 
time 7’. Thus, for a given number of samples, and a given 
degree of polynomial, the ratio of rms, output noise to 
rms, input noise will vary inversely as 7’ for a jth order 
differentiating filter. This behavior could have been 
predicted from an examination of less elaborate differ- 
entiating discrete filters. The inverse variation of the 
above mentioned ratio with the square root of the number 
of samples, for fixed memory time, may be interpreted 
loosely as the usual improvement in accuracy which results 
from making repeated measurements of the same quantity, 


FILTERING NONRANDOM, NONPOLYNOMIAL SIGNALS 


The point in going to some length to develop a device 
which is capable of filtering a nonrandom polynomial 
from noise lies, of course, in the fact that many non- 
random signals of interest may be approximated by 
polynomials over lengths of time of sufficient duration 
to allowappreciable discrimination against noise. Questions 
which immediately arise in designing a filter for a specific 
application are, what degree should be assumed for the 
polynomial, and what memory time should be used? 
It is at once evident that interacting factors influence the 
choice of these two quantities. Thus, for a given poly- 
nomial degree and sampling interval, the noise filtering 
improves as memory time is increased. On the other hand, 
as memory time increases, if the signal is not a true poly- 
nomial the degree of polynomial which is assumed to 
represent the signal must usually be increased. This 
increase in assumed polynomial degree decreases the 
capacity of the filter to discriminate against noise, as is 
revealed by an inspection of (30). 

At first glance it might be thought that arbitrarily 
great noise discrimination could be achieved by designing 
a filter to operate on a polynomial of sufficiently high 
degree to adequately approximate the signal over a 
length of time sufficient to provide the desired noise 
filtering. Additional work which is not in a sufficient 
state of completion to warrant publication leads the 
author to feel that while this is true for certain classes of 
signals; e.g., polynomial, it is not true in general. Proof 
of this has not yet been obtained, however. In practice it 
will usually be necessary to effect a compromise between 
good noise discrimination, with long memory time and 
attendant difficulties in curve fitting, and good curve 
fitting with short memory time and limited noise filtering. 


DISCUSSION 


The work just presented contains sufficient information 
for the design of an optimum linear discrete filter to 
operate on a signal composed of a stationary random 
component and a nonrandom polynomial. An approxi- 
mate expression for output noise power has been presented 
for the special case in which the signal consists of only a 
nonrandom polynomial and in which the correlation time 
of the noise is small compared to the sampling interval. 
Perhaps the most notable deficiency in the present treat- 
ment lies in the absence of any investigation of the filter’s 
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steady state transfer function. While some work has beet 
done in this direction, it was not felt that the materia 
was in a sufficient state of completion to warrant it 
presentation. 


APPENDIX 


Approximate Expression for Output Noise Power 


In the case when the signal consists of a pure poly 
nomial; 7.e., M/(¢) = 0, and when the operator A is eithe 
the identity operator or a derivative operator, the ex 
pression for output noise power given in (19) reduces t 
a single term. Thus 


Fagte 
Em = 


(No summation on J). (31 


ZA jj. 
Here 7 is the order of the derivative represented by th 
operator A. If A is the identity operator, 7 is zero. Thi 
value of uw; is determined by A as was described previously 
thus, it is only necessary to find \; in order to determin: 
the output noise power. The problem will be simplifiec 
by assuming that the correlation time of the noise i 
small compared to the sampling interval. This assumptio1 
causes the matrix y to become diagonal. For ease ii 
computation the noise power, p = ¥ (0, 0), will be take 
to be unity. Since the output noise power is a linea 
function of the input noise power under the condition 
of the case being considered, this does not constitute an: 
restriction on the generality of the result. 

Under the assumption made in the preceding para 
graph (20a) becomes 


1 iQn s/s OMe lines Oana cas cue) Wo 
Q wipe ee OR Pia ta i eee Ww, 
OO) 1 ely aN N* || wy 0 
d 
it it _- 
l 2 v 
(32 
r 
1 N —= 
0 9 0 
0 
| 
ae \ 
Om Nia eo Li; 
} 
- \ f 
0 1 N* —=* 
9 0 
An application of Crammer’s rule yields 
2 [DOE ey eee a © 
Nk a8, . 


where M/ is the determinant of the square matrix in (32 
and My.j;+2,+;+2 1s the minor of its N + 7 + 2,N+ 7+ 
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lement. Both of these determinants may be simplified 
y the following procedure. 


1) Add —1 times the elements of the first column to 
the elements of the N + 2 column. 

2) Add —1 times the elements of the second column to 
the elements of the N + 2 through the N + k + 2 
column. 

3) Add (—2)’ times the elements of the third column 
to the elements of the N + 7 + 2 column, where 7 
goes from 0 through k. 

4) Continue the process just outlined until the elements 
of the submatrix B are all replaced by zeroes. 


The resulting determinant can immediately be reduced 
1 order from N + k + 2 tok + 1. Thus 


Me= 
3 


Niet 


vi 3 (=1)*" p=1 p= p=1 (34) 
N : N : N : 
2 p 2d prt 2, p" 


p=1 p=1 
N N 
k Gaal q 
Mysise.neiee2 = (—D) Dep ey 
p=l1 p=l 
N N 
ye g+1 »s 7+2 
Pp Pp 
p=1 1 


For low values of k it is possible to evaluate \,; im- 
1ediately. If this is done for & = 1 and 2 and the result 
sed in (31), the following expressions for noise power are 
btained. 


=i 
x 12 Sh 

ce NN YN Ee)” We GD) 
=o) 

x 12(2N + 1)(8N — 3) j=1. @9 


ey (Neo Cne eis)” 
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Similarly, for = 2and7 = 2° 


2 
Em 


| 720 
~ NAN? + 1)(N + 2)(N + 3) 


(38) 


Kq. (36) is the output noise power of an NV + 1 sample, zero 
lag, measurement of velocity when displacement is a first 
degree polynomial. Eqs. (37) and (38) are the corresponding 
expressions for zero lag measurements of velocity and 
acceleration, respectively, when displacement is a second 
degree polynomial. In all cases, of course, the input noise 
power is taken to be unity. 

The direct evaluation of the determinants of (34) and 
(835) becomes quite tedious when k is large. To further 
simplify the problem of finding output noise, it is possible 
to utilize the fact that often the solutions of greatest 
interest are those for large values of N. In order to effect 
the simplification so afforded, it will be necessary to 
digress for a bit on the evaluation of sums of the form 


N 
SN arp (39) 
n=0 
Let 
p” = pp — D@—2)---@—n+1), 0) 
where 7 is a positive integer. Then 
N ; N ; N ‘i 
Dp ea pas Dep 
p=1 p=l1 p=1 
N N N 
7 gee k+1 
D peje we 
p=1 p=1 p=1 
ab N Ne k 7, 5 
PL eT = TED Ae (35) 
p=lh p=1 p=1 
N : N ¥ N : ; 
Dips pene ome 
p=1 p=1 p=1 
N : N ' N . 
»> Dp tee sD pe ee 2 D 
p=1 p=l p= 
(1) ZS Dp 
Os (D0. = a p = (41) 
=pp— Vp —2—=p — sp a22p 
etc. 


It is easy to see from (41) that it is possible to express 
p” in terms of a summation of the form. 


p= iD” a Cie ae ap a eee ope (42) 


+ : : a (Cy : he . rede 
Notice that the coefficient of p” will always be unity. 


, -, agen 
The reason for putting the p” in terms of the p” is that 
it is very simple to sum terms in p’”’ over integer values 
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of p. Thus 


st bi) i Deas p=N+1 (N a yy ney + poy ' 
hie me bane > oie 
N 1 (n+1) 
rae ao ; (43) 


The relation given in (43) is an elementary result in the 
calculus of finite differences. It is now apparent that by 
using (42) and (43) it is possible to write a summation of 
the type appearing in (39) as a polynomial in N of degree 
n + 1. Thus every element in the determinant M will be 
a polynomial whose degree is one greater than the power 
to which p is raised in the summation which the poly- 
nomial represents. Furthermore, from (42) and (43), it 
is apparent that in each polynomial the coefficient of the 
highest power of N will be the reciprocal of that power. 
It is possible to write VW in the form 


A ea yt al a yey Ronan yee (44) 


where a repeated index indicates a summation. The 
factor b,, is the element in the jth row and the mth column 
of M. The quantity 


has the value zero if any two of the z, are equal. If none 
of the indices are equal it has the value +1 or —1, de- 
pending on whether an even or odd number of inter- 
changes of indices is required to put the indices in the 
order 1, 2, 3, --- k + 1. Since each element b,) is a poly- 
nomial of degree 7 + m — 1, it is apparent that each 
term in the summation of (44) will be a polynomial. 
Also, since the lower indices of the b; in (44) must, in 
any term with a nonzero coefficient, be a permutation of 
the integers 1 through & + 1, the degree, D, of each term 
must be 


1 1 
J 2 3 
L 1 ue 
2) 3 A 
ih it is 
3 4 5 
Vig = (Say 
1 1 1 
i) Geir desl ae 
1 1 i 
gra ehoe Fo 2 
1 1 1 
ee ey a ye 
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k+1 k+1 


Dadi e) aay 


D 


k+1 


2°) Gey alee (45) 


Thus it is established that each term of (44) will be a 
polynomial of degree (k + 1)’. From this it follows that 
M itself will be such a polynomial and that the coefficient 
of the (k + 1)’ power of N will be the sum of the co- 
efficients of this power of N in the individual terms of 
(44). But in each term of (44) the coefficient of this power 
of N is just the product of the coefficients of the highest 
powers of N in the individual factors b,; appearing in that. 
term. Thus, the coefficient of the (k + 1)’ power of N 
in the determinant M may be found by replacing in (44), 
each element b,, by the coefficient of the highest power of 
N appearing in that element. This has been shown to be 
(j + m — 1)”. The resulting expression is just the ex- 
pansion of the determinant. 


: (Sea files 
2 3 se I 
iL ih 1 aes 
2 3 4. k+2 
ARE eas ci al 1 if 1 | 6 
3 4. 5 k+3 
1 1 il an il 
k+1lk+2k+8 2k + 1 


In a similar fashion it is easy to show that the highest. 
power of N in the determinant My.;+2,v+;4218 (k + 1)? — 
27 — 1. Also, it can be shown that the coefficient of this 
term is 


1 alge aloe) 

j ae 2 k + 

1 1 1 
era eae k+2 

| abd ae 
spas iicraiee k+3 

(47) 

1 1 1 
27 Wh, S29 al rad 

1 1 1 
ie Bones k+j+2 
a 1 bot ‘ae 
k+-+g k+j+2 2k + 
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Thus to obtain an approximation for \; which is valid 
r large values of N it is only necessary to substitute in 
e numerator and denominator of (33) the terms con- 
ining the highest power of N appearing in My¥;42,"+j42, 
id M, respectively. This yields 


Ns Vi BN oa Cie 
Py gil Gavalyne cue 
There remains the task of evaluating the ratio V;,,;/Q,.° 


recurrence relation involving the determinant Q, may 
» derived as follows. 


Vin 


= iO yet (48) 


1) Subtract the k + 1 column of Q, from all other 
columns. 

2) Factor all quantities common to an entire row or 
column. 

3) Subtract the k + 1 row of Q, from all other rows. 

4) Repeat step 2. 


follows immediately that 


(k!)* 
(2k + 12h! Qe. 


Q:, = (49) 


1 an entirely similar fashion 


(kk + j + 1)’ 


Vig = 
ba (2k + 1)'(2k) Wk — J) 


2 Vig es (50) 


hus 


Vii ee (k ae Gh es Vie, 
Q,, C= I Qe-1 
_&+G+D? + +2 Vir 
ear gana Ls Qi 


5 The method used for this evaluation is due to Dr. G. J. Gleghorn 
the Ramo-Wooldridge Corporation. 


(51) 
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Voi 5 Ou (52) 


so, combining (49), (51), and (52), the following result is 
obtained. 


Fat [GE it BUY 1 (53) 


Q: (K—-p)! J Q@+EDGD* 


If (53) is substituted into (48) to yield —2;,2, and the 
result is substituted into (31) an expression for output 
noise power results. 


rah latnia, [eeet oly 1 
Nive {el Raa) (27 + 1(j))* 
The input noise power is assumed to be unity. The 


moment uy; may be evaluated from (8). When the operator 
A represents the jth derivative 


(54) 


nm 
Zw 
| 


| 
pi = 3 (1) (55) 
So 
= 1 Coes) iy 
CNS ae I (k— 9)! J +1 eo) 


This is the result presented in (80). 
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Spatial Filtermg in Optics’ 


EDWARD L. O’NEILLT 


Summary—Starting with the formulation of H. H. Hopkins for 
the image forming properties of an optical system in terms of a 
coherence factor over the object plane, the two extreme cases of 
complete coherence and incoherence are considered. The incoherent 
case is treated briefly as a low-pass spatial frequency filter. 

In the case of coherent illumination, it is shown that the optical 
analog of such well-known electrical concepts as equalization [17], 
edge-sharpening, and the detection of periodic and isolated signals 
in the presence of noise can be carried out with relative ease. A 
detailed theoretical treatment of the problem together with illus- 
trations emphasizes the analogy between optical and electrical 
filtering. 


List oF SYMBOLS AND FourtEr RELATIONS’ 


i(x;, y;) = point image amplitude distribution. 

i(x;, Y¥;) = point image intensity distribution. 

6(X, Yo) = complex amplitude transmission of 
object. 


0(%, Yo) = object intensity distribution. 
i(v;, ¥:) = image amplitude distribution. 
U(x;, Ys) = image intensity distribution. 


“partial coherence factor’? in object 
plane. 


. e ml o,f — 
(Xo, Yo; Xo, Yo) = 


7(u, v) = complex transmission of aperture. 
t(u, v) = transfer function. 

O(n, v) = object amplitude spatial spectrum. 
O(u, v) = object intensity spatial spectrum. 
T(u, v) = image amplitude spatial spectrum. 
I(u, v) = image intensity spatial spectrum. 
T(u, v) = intensity variation over the source. 


All functions denoted by the same small and capital letter 
are Fourier Transform pairs, e.g., 


6c) = [| Olu, Nei") du dv. 


—o 


INTRODUCTION 


HERE has been sufficient evidence over the past 
al few years to indicate that the concepts of electrical 

communication theory have become a permanent 
fixture in the field of optics [1], [2], [4], [6], [12], [21-23], 
so that a detailed description of the electrical-optical 
analog is superfluous at this point. Nevertheless, it can 
be pointed out, in retrospect, that most of the emphasis 
thus far has been confined to the passive role of determin- 
ing the performance of the optical system [8], [14-16], 
[23-25], namely, the evaluation of the transfer function 


* This work was supported by the Aerial Reconnaissance Lab., 
Wright Air Development Center, under Air Force Contract No. 
AF 33(616)-432. 

+ Boston Univ. Phys. Res. Labs., Boston, Mass. 

1' This notation follows that of Linfoot and Fellgett [11] in which 
the “hooked” (+) variables denote complex amplitude and the 
‘“nhooked” variables denote intensity. 


or sine wave response curve. A study of the exceptions 
to this statement shows that, even in these cases, strictly 
speaking, a communication theory approach has not been 
employed [10], [13]. 

With the publication of an article by Maréchal and 
Croce [17] on increasing the contrast of photographs, it 
became apparent that the more general problem of 
detecting and recognizing signals in the presence of noise 
might be approached through the concepts of electrical 
communication theory. Not only was the general synthesis 
of an optical system possible, but it appeared immediately 
to involve some simplifications as compared to its electrical 
counterpart. : 

Rather than consider each optical system as a specia 
problem, it was decided to treat the general problem o} 
image formation by including a factor which describes 
the illumination over the object plane. For this reasor 
the formulation of Hopkins [8], [9] was adopted to evaluate 
the final intensity distribution in terms of the ‘“partia. 
coherence factor.’”’ It was then possible to discuss the 
two extreme cases of completely coherent and incoherent 
systems. The former, being linear in amplitude, coulc 
be handled from the Fourier standpoint. The latter 
being linear in intensity, was also subject to Fouriei 
analysis and synthesis along with certain basic limitations 
However, since in the coherent case, control over the 
Fourier components that make up the image could bc 
exercised with relative ease, it was possible to carry out ¢ 
series of simple experiments that bore a one-to-one 
correspondence to known filtering operations in electrica' 
communication theory. The paper presents both a theo. 
retical and experimental treatment of the general formu 
lation of the problem together with illustrations of the 
results. 


Tue Fourter ANALYSIS AND SYNTHESIS OF COHEREN® 
AND INCOHERENT OPTICAL SYSTEMS 


Generalized Image Formation 


In any comprehensive treatment of image formatio1 
in an optical system, it is convenient to incorporate : 
factor which describes the mode of illumination over thy 
object plane. For monochromatic systems Hopkins [8] 
[9] has succeeded in doing this in terms of a “partia 
coherence factor.”’ Because of the comprehensive characte 
of such a formulation, this term will be employed in th 
analysis to be presented in this paper. However, onl: 
the two extremes of complete incoherence and coherence 
will be considered here. 

As a starting point, consider an element of the soure 
do (Fig. 1) producing a complex disturbance @(a, Yo) a 
the point (a, yo) in the object space where %(a, yo) is 
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plution of the scalar wave equation. The resultant 
mplex transmitted amplitude is now given by A(2o, yo) 
Xo, Yo). To determine the image distribution é(x,, y,), 
is necessary to integrate the point image distribution 
y:) over the object plane in the following manner: 


II ix, — Xo, Yi — Yo) U(X, Yo) 6(xo, Yo) Axo dyo. 
(1) 


Ls Yi) = 


OBJECT 
PLANE 


APERTURE 
PLANE 


IMAGE 
PLANE 


Fig. 1—Generalized image formation. 


_ However, the eye or any physical instrument is sensitive 
9 intensity variations, hence we must determine the 
nage distribution from an independent point in object 
pace (x5, ys) and perform a similar integration over the 
omplex conjugate of the functions in (1). We have then 
di(x;, Yi) oa WD; ; Yi) dealers Yi) do (2a) 


rt 
ile.,y) = do [fff fe = 20,4. = ttl. = 2, 9. = 9) 


Wo, Yo)O*(X5, Yo) (Lo, Yo) O*(x4, Yo) ASo ASS 


rhere 


(2b) 
dso = di, dyo--and dSi = dxi.dyp « 


‘nally, adding up the contributions from the source in 
he final integral gives 


©, Yi) = iif I WU(Xo, Yo)U* (xo, Yo) do 


tx; = dbg Ue Yo) E*(x; i Xo, Wo = yo) 


- 6(X0, Yo) O* (x5, Yo) ASo ASo . (3) 


Yow, the term inside the brackets is Hopkins’ partial 
oherence factor y(%o, Yo; ®, Yo) Which, for a plane source, 
; given by 


es to, a) = // Dele ee eS Ce ily) (A) 


here I'(u, v) describes the intensity variation and 
eometry of the source, and is zero outside the source. 
can be defined in terms of the contrast of the Young’s 


fringes in the (u, v)” plane formed by pinholes at (xo, Yo) 
and (x, yo) when illuminated by a source whose geometry 
and intensity variation is given by I'(u, v). Substitution 
of (4) into (8) results in the intensity distribution over the 
image plane in the compact form 


Ux, Yi) = fill (X05 Yo3 Lom Yo) 


E(x, Set CO Une ie Yovt* (a; ray ai Ue Ys) 


IOC 9 5° Yo)'0* Chg, VOua Se doen (5) 
Though (5) appears as a rather formidable formulation 
of the final image intensity variation, the problem simpli- 
fies considerably when either of the two extremes of 
complete coherence and incoherence are considered. 
Frequently it is convenient to carry out the analysis of 
a system, not in terms of the independent space variables, 
but rather in the corresponding spatial frequency domain. 
Hence, proper manipulation with Fourier integrals, 
similar to Hopkins’ treatment of Fourier series, yields 


Ca) = fill uy vs a WOH, NOU!» * 


OP eed Pa ah BONIS) du dy du’ dy’ (6) 
where 
T(u, V; Bs v’) oe I I\(s, t) 
-#(s + yu, t+ vee + w’, t+’) ds dt (7a) 


and 
CG)e— I WG; oe dae OYo: (7b) 


With this generalized formulation in mind, we will now 
focus our attention on the problems of the Fourier analysis 
and synthesis of optical systems which are linear in 
intensity (incoherent) and linear in amplitude (coherent). 
Before distinguishing between the analysis and synthesis 
of incoherent systems, it seems appropriate at this point 
to discuss the physical significance of completely inco- 
herent illumination with respect to (5), (6), and (7). To 
begin with, when dealing with a self-luminous object, we 
should expect only point-to-point coherence or incoherence 
since every point in object space radiates independently 
of every other point. Mathematically, such a situation 
can be described in terms of the well-known Dirac “delta- 


2 The relation between the cartesian coordinates (wu, v) in Fig. 1 
and the reduced coordinates (u, v) is 


ku kv 
D . . . 
where k = 27/) is the wave number of the radiation and p is the 


distance between the object and aperture plane in Fig. 1. 
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function.” Thus describing the partial coherence factor 
by the delta function and making use of (7b), we can, 
conceptually at least, replace the self-luminous object by 
a transparent one illuminated by an “effective source,” 
infinite in extent. In a similar fashion, for a transparent 
object illuminated by a large extended source, we can 
consider such an object self-luminous for all practical 
purposes, provided the source is large enough [8], [9]. 
Since this type of illumination is the one most frequently 
encountered, most of the effort thus far in optics has been 
directed toward the analysis of incoherent systems. 


Incoherent Illumination 


Analysis: In a broad sense, the analysis of a linear 
system can be defined as the determination of the Fourier 
frequency components that the system passes to make up 
the output. In the optical case, the sine wave response of 
the system is determined as a function of the spatial 
frequency from a knowledge of the geometry and complex 
transmission of the aperture; e.g., coating, aberrations, etc. 

To illustrate this, consider (5) where, for the incoherent 
case, the partial coherence factor y(%, Yo; %6, Yo) 18 a 
“delta” function. This gives 


ites) = [fff ae = 26, -— 


E(x, 7. Yot* a; Senieiy =" 405) 


(8) 


Integrating over d2xjdyj and making use of the sifting 
property of the ‘‘delta’”’ function, we have 


-6(20, Yo) O* (xh, yo) dito dyo dxé dy}. 


NUD es I | te: — 2, ¥: — Yo) |” | 6(a0, Yo) |'-dxo dyo 
TA (9) 

OL 

ie, y) = ff te. = 2, ys = woleo, yo) Ato dy» (10) 


—o 


where we see that the system is linear in intensity. (See 
Fig. 2.) The convenience of the Fourier approach becomes 
apparent when we perform a Fourier transformation of 
(10) to the spatial frequency domain and in the process 
replace the rather cumbersome convolution integral by a 
multiplication process. This gives 


I(u,v) = r(u, v)O(u, v). (11) 


Thus we may consider the transfer or system function 
t(u, v) Operating on the spatial spectrum of the object 
intensity distribution to produce the spatial spectrum of 
the image intensity distribution. The term 7(u, v) has 
been assigned a variety of names in recent years, and, in 
fact, may be defined in a number of ways. For the sake of 
clarity and simplicity, we will define it in this paper as a 
measure of the reduction in contrast, and relative phase 
shift in passing from object to image space, of a sinusoidal 
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= it t(x, = Lig Ue = Yo) O(Xo, Yo) dx» dyo 


—o 


I(u,v) = ru, v)O(u, ») 


Fig. 2—A linear incoherent optical system. 


Ux: Yi) 
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intensity variation of increasing spatial frequency. It, 
units along the abscissa will be in reciprocal length (lines, 
mm), and it will be called the ‘‘transfer function.” 

A review of the literature [16], [18], [23-25] shows tha, 
most of the attention thus far in recent years has beey 
focussed upon the evaluation of 7(u, v). The reason for thi 
is apparent when we consider the various Fourier relation: 
that exist in systems that are diffraction limited. Fo, 
example, one way of defining r(u, v) mathematically is iy, 
terms of the Fourier transform of the point image intensity 
distribution; 7.e., 


7H, ?) = I (ene |e  aemaue (12a 
or 
| 


t(u,v) = I (4;, Yue, ye OY aa Ou. (12b 


—-a« 


which, after application of a well-known theorem for thy 
Fourier transform of a product, gives | 


ru) = ff ew dw! tae +) dul’ (8 


where 7(u’, v’) describes the complex transmission anc 
geometry of the aperture and is zero outside the aperture 
For a uniform circular aperture with aberrations, we hav 
the following: 

SEO: ike yp! G A 


0 Lee SCA, 


(14) 


(un, ve) ! 


where A denotes the area of the aperture and wher 
A(u’, v’) is the deviation of the actual wave front from : 
reference sphere (the aberrations). Because of this apparen: 
simplification, it should be possible from lens design dati 
to substitute the proper coefficients (e.g., Seidel aberratioi 
coefficients) into the expression for A(u’, v’) and perforn 
the convolution operation described by (13). In practice 
of course, because of the form of (14), the integral can 
rarely be evaluated in closed form. Nevertheless, despit: 
the practical difficulties associated with this approach th 

, | 
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| Sa of an incoherent optical system as a two- 
ensional low-pass spatial frequency filter has strong 
onceptual appeal. 
| One advantage of such an approach, (13) and (14), is 
nat in theory at least, one can evaluate the transfer 
inction from a knowledge of the aberration coefficients 
ithout ever examining in detail the complicated physical 
iffraction pattern. 
Synthesis and Limitations: In place of the passive role 
ne assumes in analyzing linear systems, one might wish 
» take a more active part in altering the system’s per- 
mance by changing the image forming properties of 
ie system and, hence, the transfer function. One method 
: doing this is to apply complex amplitude coatings over 
re aperture [10], [13], and [19] and hence alter the point 
nage distribution. An extreme in this direction would be 
) place a central obstruction in the aperture so that the 
ransparent area is annular [18], [25]. In fact, it might be 
ossible to optimize the frequency response within a 
iven prespecified spatial bandwidth of particular interest.” 
Although several of these schemes have met with some 
iccess for special problems, the flexibility afforded to 
datial filtering operations through generalized synthesis 
; quite limited with incoherent systems. A little reflection 
‘ill show that this must be the case. Basically, we are 
ealing with the addition of nonnegative intensity varia- 
ons everywhere in the image plane which will always 
ive rise to background or constant levels, so that all 
1coherent systems behave as low-pass spatial frequency 
Iters.* Bearing this in mind, we are led to the obvious 
onclusion that general spatial filtering can be performed 
nly in systems in which destructive as well as con- 
ructive interference can occur. That is, we must be able 
» control amplitude and phase and, therefore, we are 
d to a consideration of the image forming properties of 
oherent systems. 


oherent Illumination 


Analysis: It might be well to preface this section with 
1e statement that the optical system to be discussed here 
ould be described by the microscopist as one employing 
point source Kohler illumination.” In terms of the 
juations already derived for the more general case, it is 
ow necessary to impose the condition that, for a point 
yurce, I'(u, v) is given by the Dirac delta function. Under 
us condition (5) becomes 


3 For example, one possible criterion might be to minimize the 
ean square difference between the aberrated system and an ideal 
1e Over a given spatial frequency range, 7.e., 


— f(r = 2)" de 
Wo — W, W1 
here 7) = transfer function in the ideal case (no aberrations) 
id + = actual transfer function. One might then determine the 
rrect balancing of aberration coefficients to make e a minimum. 
he frequency range w; < w < w» would of course be determined 
‘ the application. 

4This of course is no longer true for electro-optical systems in 
hich the constant term can easily be eliminated [11]. 
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Aer Yi) = [| Ho, — Xo, Yi — Yo) O(Xo, Yo) AXo dYo 


—o 


Lf EC. = 2h, ys = yhO*Cws, Wh) de, dy, 


—o 


(15a) 


or 


(Le, Y:) = (x,, yi)t*(x,, Yi). (15b) 


It is apparent that in terms of intensity, the system is 
nonlinear. Therefore, for convenience, the communication 
theory principles will be applied to a system linear in 
amplitude, tacitly assuming that (15b) will be needed to 
evaluate the resultant intensity variation. 

With this in mind, we can give the image amplitude 
distribution by the convolution integral of the point 
image amplitude distribution over the complex object 
amplitude distribution as follows: 


a., yi) = I a, — Lo, Yi — Yo) O(Xo, Yo) Ato dYo. (16) 


—oo 


Here again it is possible to replace the convolution integral 
by a multiplication process by transforming to the spatial 
frequency domain. This gives’ 


T(u, ») a #(u, v)O(u, »). 


Again this equation may be interpreted as an operation 
on the object amplitude spectrum to give the image 
amplitude spectrum. Now, however, we see that the 
amplitude transfer function is identically given by 


(17) 


#(u, v) which describes the geometry and complex trans- 
mission of the aperture in the (u, v) plane. Thus it becomes 
apparent that the Fourier components that make up the 
image can be controlled by inserting the proper mask in 
this plane. (See Fig. 3.) 


— Xo, Yi x Yo) O(Lo, Yo) dX dYo 


= 4(u, )O(u, ») 


. 3—A linear coherent optical system. 


>This formulation can be arrived at by successively applying 
Huyghen’s principle over the object and aperture planes as shown 
by Rhodes [21], [22] who in turn discusses the conditions under 
which the Fourier transform relations are applicable. 
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Before investigating these possibilities, it seems proper 
to verify the statements given above with a few simple 
examples. Consider first of all a clear unaberrated circular 
aperture. Then 7(u, v) is the familiar cylindrical ‘‘top-hat”’ 
distribution, which “‘filter-wise’” signifies an ideal low- 
pass, two-dimensional filter. There exist several basic 
limitations to perfect imagery in the system under dis- 
cussion. The first is due to the wave character of radiation 
itself [6]. If the condition is imposed that the incident and 
emergent waves in object space must both satisfy the 
scalar wave equation, there results an inequality in terms 
of the wavelength of the light and the spacing between 
detail in the object. For detail closer together than the 
wavelength of the light, the coefficient of the space variable 
in the exponent becomes real and negative, instead of 
imaginary, which results in waves that die out within a 
few wavelengths of the object. This limitation exemplifies 
a well-known principle in communication theory which 
deals with the inability of a carrier to transmit information 
concerning details closer together than the wavelength 
of the carrier. 

The second limitation imposed is due to the finite 
dimensions of the aperture. That is, while all diffracted 
angles between zero and ninety degrees are possible, only 
those Fraunhofer or Fourier orders will enter into the 
formation of the image that intercept the aperture in the 
(u, v) plane. Since this point of view is merely a generalized 
Abbé theory of image formation in the microscope, it 1s 
a simple matter to verify Abbé’s claim, that for periodic 
targets the resolution limit of the system can be doubled 
by employing oblique illumination. 

In order to do this, the point source must be moved off 
axis, which results in a tipped wave passing through the 
object plane. However, this is equivalent to having axial 
illumination with a linear phase plate over the object; 
hence the complex transmission of the object becomes 


6(o, Yo) = 6'(Xo, Yo)e (18) 
where ao, 8) are the direction cosines’ of the incident 
tipped wave. The Fraunhofer spectrum in the yu, v plane 
is given by 


tk( @orto+Boyo) 


O(n, vy) = I 6'(ao, ye nthe nae Oa PEs wal deo BYos (19) 


and, if we define the terms po = kay and » = k®y (19) 
becomes 


O(n Fon Yo) = I 6'(Xo, yao eee eae dx dYo- 


Bi (20) 


That is, the spectrum of the object remains the same but 
is shifted to a new center which is, of course, the geo- 
metrical image of the displaced point source. 

The resultant amplitude in the image plane is now given 
by 


(xi, yi) = I Mu, WOU oy re dua 
(21) 
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Changing variables, we obtain 


foe) 

i(poritvoyi) , 

Crt; ; I Fu — Ko)? 
—oo 


O(n’, pier ee du’ dv’. 


i(x i) Yi) ae Yo) 


(22) 


Hence, the transfer function in amplitude is given by the 
complex transmission of the aperture centered about the 
shifted position as shown in Fig. 4(a). | 

To illustrate this point further, consider a crossed 
sinusoidal target with spatial frequencies w,, w, in the 
x and y directions respectively, such that 


—t(Wztot+ WyYo) 


(23) 


6'(Xo, Yo) = 


Eq. (20) becomes 


O(n Aa ow? ae %) = =r ff: [(utpo-—Wz)to+ (V+v0—Wy) vol dx dyo | 
(24) 


which, except for a constant, is the integral representation 
of a delta function, so that 


O(n, ») — (wy — »)] 


where c is a constant. Substituting this expression into 
(21) and making use of the sifting property of the delta 
function, we obtain 


c° blu Fi Wa: ano) a7, (25) 


EEL OMEN OUR YA) (26) 


CBR Os) ee Cw, — — vo)e 


Moy ®y 
which becomes, after normalizing, 


i(a,, y;) a + Nonis Zeus) (27) 
Now noting that the last term merely describes the 
obliquity of the wave, we see that the object structure is: 
reproduced in the image but that the amplitude is modu- 
lated by 7(w, — so, ®, — ¥). This implies that if the 
Fourier or Fraunhofer orders do not fall within thé region 
of the shifted aperture function 7(u, v), the periodicity. 
wil not be reproduced in the image. (See Fig. 4.) Now, 
since the spectrum is shifted by an amount correspondin 
to the direction cosines of the incident illumination, the 
final contrast and resolution limit are functions of the 
complex transmission of the system, the angular size of 
the aperture, and the direction of illumination. In terms 
of the sine wave resolution limit of the system, it is 
necessary that w, satisfy the following inequality in order 
for the periodicity to be detected in the image: 


Wz Spo + mh (28) 


where »/k represents the direction cosine of incident 
illumination and yu,/k the maximum angular dimension of 
the aperture. Then, 


TO, = May: Oy aan) 0. (Pay ve 


P, < k(sin 6) + sin 6,) 


or, at the resolution limit, 


r 


PAS : 
(NA)oi. + (NA) ooi1. 
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Fig. 4—Comparison of transfer functions. 


For axial illumination 


d 


ss 
(N A) oi. 


(31) 


For oblique illumination with the numerical aperture 
(NA) of the collimator equal to the numerical aperture of 
the objective, 


r 


Pe = OV A)au. 


(32) 
thus verifying Abbé’s claim that the resolution limit for 
periodic structures can be doubled by employing the 
proper oblique illumination. 

We have illustrated rather sketchily what might be 
termed the analysis of a linear coherent optical system. 
It now becomes clear that the synthesis of such a system 
is more promising than the synthesis of the previously 
discussed incoherent case. This is evident from an ex- 
amination of (17) and Fig. 3, where it can be seen that the 
Fourier spectrum that constitutes the image can be 
controlled by inserting the proper mask in the (u, v) plane 
and hence controlling 7(u, v). It is upon this aspect of the 
system that attention will be focussed in the remainder 
of the paper. 

Before formulation of the communication theory 
approach to the problem of spatial filtermg, it should be 
pointed out that this scheme of controlling the Fourier 
spectrum of the image is not new. The obvious illustration 
of this is the phase contrast microscope; however, the 
method first came to the attention of the author in an 
article by Maréchal and Croce [17] on increasing the 
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contrast of photographs by a method known in com- 
munication theory as equalization [4]. Further investi- 
gation showed that Porter [20] in 1906 had performed a 
series of then remarkable spatial filtering experiments in 
verification of the Abbé theory of the microscope. 

Synthesis: The purpose of the preceding section was to 
illustrate how a linear system can be analyzed by Fourier 
methods. It has been shown that the important contri- 
bution of such an analysis is that the rather cumbersome 
convolution process in the space domain can be replaced 
by the more practical multiplication process in the spatial 
frequency domain. It is now evident that the key to such 
an analysis is the transfer function 7(u, v), which operates 
on the input spectrum to give the output spectrum, in this 
case the object and image respectively. In the synthesis 
processes to be discussed, it will be demanded that’ 7(u, v) 
assume special characteristics in order to perform certain 
predescribed operations. 

In the selection of #(u, v) in the remainder of this paper, 
the terminology and experience of electrical communi- 
cation theory will be drawn on wherever applicable so that, 
to a large extent, the analogy between the electrical and 
optical systems will be emphasized. 

A. Equalization: It is well-known in communication 
theory that if we have n linear systems in cascade, the 
final output spectrum is given by 


Poul) = Fi) 72) +++ F2(@)F in) (33) 


where 7,(w) is the transfer function for the k’th system. 
It should be noted in passing that Fourier methods really 
demonstrate their versatility in analyses of this kind since 
the n-fold multiplication process in the frequency domain 
is the counterpart of a series of n-fold convolution integrals 
in the space or time domain. 

In the process of equalization we would like, in theory 
at least, to control 7,,(w) in such a way that it becomes the 
reciprocal of the product of 7\(w)t2(w) +--+ 7,-1(w); that is, 
we would like to design a 7,(w) such that 


1 


Gs (3) es pea 


I] F.(w) 


k=1 


(34) 


The reason for this is obvious when we examine (33). Of 
course in practice such control is not always possible. It 
was in this respect that Maréchal and Croce were able to 
increase the contrast of photographs by inserting in the 
(u, v) plane a mask that was dense at the center and 
gradually became transparent at the edge in such a way 
that the product of #(w) with the primary imaging transfer 
function gave a flat response. The result of inserting such 
a mask is shown in Fig. 5. 

B. Edge Sharpening: The next step in altering the spatial 
spectrum of a photograph is the deletion of the constant 
term. This is similar to inserting a condenser in an electrical 
network and represents a high-pass filter. In terms of 
#(u, v) it merely means the insertion of a small opaque 
spot on axis in the (z, v) plane. The result of this operation 
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Fig. 5—Increase of contrast. 


Object Image 


Fig. 6—Edge sharpening. 


is shown in Fig. 6. Since, from an electrical standpoint the 
process is similar to differentiating the input, it would 
appear possible to add the edge-sharpened photograph to 
the original, optically or photographically. It should be 
noted that any operation such as edge sharpening or 
increasing signal to noise ratio for given geometrical 
shapes affects similar distributions anywhere in the field 
of view, since the Fraunhofer spectrum is formed on axis 
in the (u, v) plane. In addition to this favorable quality, 
the method has an additional property of performing 
filtering operations in two-dimensions at once, as opposed 
to electrical schemes. 
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(©. Grain: In the special case where the distracting 
element in a photograph is the grain, these rapid fluctua- 
tions can be smoothed out as long as the size of the 
desired detail is not of the same order of magnitude. 
There are, of course, a number of other ways of doing 
this since all incoherent systems behave as low-pass 
spatial frequency filters. For example, the photograph 
could be blurred slightly or viewed out of focus. For the 
system under discussion, several photographs were chosen 
that appeared objectionably grainy to the eye. The 
aperture in the (u, v) plane was stopped down by a dia- 
phragm until the visual image appeared noticeably 
smoother, and the image was then photographed. The 
result of rejecting these higher spatial frequency com- 
ponents due to grain is shown in Fig. 7. The effect is more 
pronounced when viewed with some magnification. 


Object Image 


Fig. 7—Isolated detail with grain. 


Detection and Recognition of Signals in the Presence 
of Noise: For convenience, in the remainder of this paper 
it will be assumed that the object function can be broken 
up into the sum of two separate functions, os and oy, the 
signal and noise functions respectively. The reason for 
this is to maintain the electrical communication theory 
analog as far as possible. Bearing this analogy in mind, 
we will now subdivide the types of signals to be discussed 
into the three main categories of periodic, isolated (tran- 
sient), and random detail. 

A. Periodic Signals: It is a well-known principle in 
modern communication theory that an ideal method for 
recovering a periodic signal in noise is to cross correlate the 
signal plus noise distribution with a ‘Dirac comb;” 7.e., a 
series of evenly spaced sharp pulses at intervals corres- 
ponding to the fundamental period of the signal to be 
detected. This scheme has obvious applications in radar 
systems, for example. The optical equivalent of such a 
cross correlation procedure is to insert an opaque mask in 
the (u, v) plane with pinholes at positions corresponding 
to the spectral orders of the periodic portion of the target. 
The image forming lens recombines these spectral orders 
to produce the periodic disturbance in the image. (See 
Fig. 8.) 

Loosely speaking, the opaque mask with the properly 
positioned pinholes can be represented as 
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(35) 


= iy) 


Fu, v) a: 


where w,, = mw,, and w, = Nw, 

Multiplication of #(u, v) with the Fourier. spectrum of 
signal and noise has the effect of passing the discrete 
spectrum of the periodic signal while sampling the con- 
tinuous noise spectrum at discrete intervals. The result 
in the image plane is a reproduction of the periodic signal 
plus a small slowly varying term that oscillates about the 
average noise level. The calculation of the image distri- 
bution can actually be carried out for the case when there 
is little or no correlation between the noise and filter 
distributions [7]. Defining the total object distribution as 
a sum of signal and noise we have 


6(X0, Yo) = Os(%o, Yo) + Ow(Xo, Yo) (36) 


where 


Aso, yo) = > Do ementerneerene 
m n 


The resultant image distribution is a convolution 
integral of the impulse response with the 6(%o, yo) in (36). 
Because a Dirac comb is its own Fourier transform, the 
total integral is the sum of a convolution integral of a 
Dirac comb function with the periodic signal plus the 
convolution of the comb function with the noise. Because 
of the periodicity one may deal with average values of the 
integrals in terms of the cross correlation function. It can 
be shown quite readily that the cross correlation of a 
periodic signal with a Dirac comb function of the same 
period is, except for a constant, a reproduction of the 
periodic signal. By contrast, if the noise and comb func- 
tion are uncorrelated, the result is a constant related to 
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the average noise level. With this in mind, the image 
distribution becomes 


Nae) = KO arenes 


B. Isolated Signals: The solution to this problem is not 
new to the communication engineer [7]; for this reason 
only an outline of the method will be presented here. The 
photointerpretive equivalent of maximizing the peak 
signal to rms noise ratio would be maximizing the prob- 
ability of detecting an isolated piece of detail in unwanted 
background. The more selective recognition problem 
would seem to involve a sharpening or outlining operation. 
For the peak axial signal we may write 


(37) 


peak axial signal ~ I 7(u, v)Os(u, v) du dp (38) 
and for the rms noise 
foo} 1/2 
rms noise ~ I | #(u, v) |? du dy (39) 
so that the ratio becomes 
[f #u, 9)6slu, 2) du a» 
peak axial signal ees | (40) 
rms noise 3 ee 


ih | #(#, ») |? du dv 


where again the subscripts S and N denote signal and 
nolse respectively. Several independent investigators 
have shown that this ratio is a maximum when 


F(u, v) Fe OR(u, v) (41) 


that is, when the filter function is the complex conjugate 
of the desired signal spectrum. With such a filter, it can 
be readily shown that the image distribution is merely the 
autocorrelation function of the signal alone when signal 
and noise are uncorrelated. 

An approximation to this filter can be achieved in the 
optical case by photographing the spectrum of the signal 
alone. The negative transparency can then be used as a 
mask when the signal and noise combination are viewed. 
(See Fig. 9). Note, however, that such a filter does not 
necessarily outline the signal as does the edge sharpening 
operation, but rather maximizes (40) and hence enhances 
the probability of detecting the signal. Finally, as Gold- 
man [7] points out, for a given noise level, the ratio in (40) 
depends only upon the signal energy in the object. This 
would therefore appear to be a starting point for deter- 
mining thresholds in terms of input (object) signal to 
noise ratio. 

C. Random Signals: A detailed treatment of this 
problem is beyond the scope of this paper and present 
considerations. Nevertheless, for the sake of completeness, 
it proves of interest to examine how far we can go in ex- 
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Fig. 9—Isolated signal in the presence of ‘‘noise’’. 


tending the one-dimensional voltage vs time electrical 
considerations to the two-dimensional optical case. In 
such an extension, a suprising simplification is at once 
obvious in the optical problem. As pointed out by Cheat- 
ham and Kohlenberg [2], the fundamental difference 
between optical and electrical filters les in the fact that 
once a filter function has been determined in the electrical 
case, one must then impose the condition that its response 
vanish for t < 0. This restriction is basic in that no amount 
of equipment or ingenuity can circumvent it. Such is not 
the case in the optical system. The impulse or point 
source response in optical systems hes on both sides of 
the space axes and in many cases possesses symmetry. 
From this point of view the design of an optimum filter 
presents some simplification in the optical case. 


CoNCLUSION 


As pointed out recently by Linfoot and Fellgett [12], an 
optical system can be assessed from two distinct aspects. 
First of all, in terms of the primary imaging operation, it 
may be demanded that the intensity distribution in object 
space be transformed into an intensity distribution in 
image space with as high a degree of fidelity as possible. 
Secondly, assuming that it is impossible to exercise 
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control over the primary imaging system, it may be 
demanded of a secondary imaging system that it extract 
as much useful information as possible from the photo- 
graph at the expense of unwanted background detail.” 
It is upon this latter aspect that attention has been 
focussed in this paper. Because the problem is not far 
removed from the problem of detecting signals in the 
presence of noise in electrical systems, the terminology 
of electrical network theory has been adopted wherever 
feasible. 

A cursory examination was made of the image-form- 
ing properties of an optical system in general; the special- 
ized cases of the analysis and synthesis of incoherent 
systems were discussed; and the limitations for such 
systems were mentioned, so far as spatial filtering is 
concerned. From an investigation of the coherent case, 
it became apparent that a method for performing spatial 
filtering could be carried out. To test this hypothesis, 
several experiments were attempted, in which the de- 
termination of the proper filter could be predicted from 
the equivalent electrical network problem. In fact, for 
many of the specialized problems of detecting certain 
geometrical patterns in the presence of unwanted back- 
ground, the determination of the proper mask (filter) to 
insert becomes almost self-evident. Actually, for search 
problems such as this, the amount of information needed 
is analogous to the electrical case; that is, one must know 
either the desired pulse shape approximately or the 
statistical character of the noise (the power spectrum). 
In brief, if overlapping signal and noise are to be separated 
at all, they will be separated in their spectra, and the 
advantage of the optical over the electrical system is 
that such a separation takes place automatically in the 
optical case through the phenomenon of diffraction. 

Finally, the same caution that the electrical engineer 
must exercise 1s warranted here except possibly to a larger 
extent. The insertion of a filter can often times give rise 
to false information in the image and unless properly 
interpreted may, in fact, negate the purpose of the filter- 
ing operation. While this information in the electrical 
case 1s most often put on a recording device and then 
studied objectively, it may happen that the optical image 
is examined visually which creates, as any microscopist 
knows, a number of psycho-physical problems. Whether 
or not this represents a basic limitation to the ideas dis- 
cussed here remains to be determined. 
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_ * This second aspect may imply reconstruction methods such as 
Increasing contrast or edge sharpening, or it may possibly imply 
separation of usable and unwanted background detail in the photo- 
graph through spatial filtering. 
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Effects of Signal Fluctuation on the Detection of 
Pulse Signals in Noise’ 


MISCHA SCHWARTZt 


Summary—The Neyman-Pearson statistical theory on testing 
hypotheses has in previous work been applied to the problem of 
the detection of nonfluctuating constant-amplitude signals em- 
bedded in noise. This work is extended in this paper to the case of 
signal power fluctuating according to a prescribed probability 
distribution. The effect on system performance of possible corre- 
lation between successive signal pulses is taken into account. 

The introduction of signal fluctuation leads in general to some 
loss in system performance as compared to the case of nonfluctu- 
ating signals. This loss is most pronounced when there is complete 
correlation between successive signals, and is quite small when 
successive signals are independent of one another. 


INTRODUCTION 


HE PROBLEM of the detection of pulsed signals 
in the presence of noise has been treated rather 
extensively in the literature."° The Neyman- 

Pearson theory of testing statistical hypotheses®’’ has 
been of particular value in much of this work; detailed 
analyses utilizing the Neyman-Pearson procedure have 
been made for the special case where the signal is assumed 
to be of constant amplitude.’ ° 

It is the aim of this paper to extend the analysis of 
signal detection to include the effect of signal amplitude 
fluctuations. 

This is of particular significance in search radar systems 
where the signal received represents reflected energy 
from an airplane or other complex echoing target. Ex- 
perimental investigations indicate that the signal ampli- 
tude in such cases may fluctuate through ranges of as 
much as 20-30 db as the aspect of the target changes 
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3D. L. Drukey, “Optimum Techniques for Detecting Pulse 
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1232-1236; October, 1952. 

5 Middleton, ‘‘Statistical theory of signal detection,’’ TRANs. 
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7 J. Neyman and E. 8. Pearson, ““Testing of statistical hypotheses 
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pp. 492-510; 1933. 


with respect to the radar system.*’® If the turbulent 
motion of a plane is rapid enough compared to the radar 
system pulse repetition rate, wide fluctuations in reflected 
signal energy may take place from pulse to pulse. 

The application of the Neyman-Pearson theory to 
constant-amplitude signal detection leads to a very 
simple threshold detection scheme for determining the 
presence of signal."”* It can be shown quite simply that 
the same detection system results when the Neyman- 
Pearson procedure is applied to the case of rapidly- 
fluctuating signals. 

This threshold system may be described as follows: 
A group of successive narrow-band IF pulses are detected 
by an envelope detector of the square-law type’ and then 
summed by an appropriate integrating device. The 
resultant sum is required to exceed a fixed voltage thresh- 
old level to be labeled a signal. (Such a signal detection 
system was described in the literature by several authors 
before recognition of the procedure as a statistically 
optimum one in the Neyman-Pearson sense.’' 

The analysis of this system is carried out in this paper 
for three separate cases of signal fluctuation: 

1) Signals completely uncorrelated: Here fluctuations 
in received signal power are so great from pulse to pulse 
that successive received signals may be treated as in- 
dependent random quantities. 

2) Signals completely correlated: The signal power 
received can only be defined statistically (7.e., in a prob- 
ability sense), but, once received on the first return pulse, 
it will remain constant over all pulses in the group. This 
would correspond to the case in airplane detection, for 
example, in which the plane motion is slow compared to 
the time taken for the antenna to move a beamwidth in 
azimuth. 


8D. Kerr, Ed., “Propagation of Short Radio Waves,” M.I.T. 
Rad. Lab. Ser., McGraw-Hill Book Co., Inc., New York, N.Y., 
vol. 13, ch. 6; 1951. 

°J. L. Lawson and G. E. Uhlenbeck, ‘Threshold Signals,”’ 
M.I.T. Rad. Lab. Ser., McGraw-Hill Book Co., Inc., New York, 
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2M. Schwartz, “A performance measurement criterion for 
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3) Signals partially correlated: This intermediate case 


| is analyzed here only for the simplest choice of two 


samples added. The effects of video filtering, nonideal 
adding circuits, and antenna beam-shape are all neglected 
in this analysis.” 


THRESHOLD DETECTION ANALYSIS 


The threshold system can be described statistically in 
terms of two defined probabilities (related directly to 
the two types of error in the Neyman-Pearson tests). 

These are: P,,—the probability of integrated noise 
alone (in the absence of signal) exceeding a voltage 
threshold or bias level, and being labeled signal. A specified 
tolerable value for P,, fixes the threshold level; and 
P,,—the probability that the integrated signal in the 
presence of noise will exceed the chosen threshold level. 

The over-all system performance (envelope detector, 
integrator, and threshold circuit) is rated in terms of the 
signal-to-noise (at the input to the detector) needed to 
achieve a given P,, with P,, fixed. With P,, = 90 per 
cent, for example, and P,, = 10°*° (7.e., the probability 
that noise will exceed the level is at any instant 1 part 
in 10*°), a certain signal-to-noise ratio, S/N, is needed 
if the signal is to be correctly detected (on the average) 
nine times out of ten that it appears. The specific value of 
S/N will of course depend upon the number of pulses, 
K, integrated. 

Specifically, if p,(V) is the probability distribution (or 
density ‘function) of noise voltage at the output of the 
detector-integrator combination and p,(V) the distri- 
bution of signal plus noise, 


P= [ pV) av (1) 


and 


Pa= [piv av C) 
with b, the threshold level in volts. 

The noise distribution, p,(V), is readily determined. 
Thus, the envelope distribution of the Gaussian noise 
assumed at the detector input is given by the familiar 
Rayleigh distribution, 


—r?/2o 


MO) ap (3) 


with y the mean-squared noise voltage at the detector 
input. 
The distribution at the output of a square-law envelope 
detector can then be readily shown to be 
—V/2aso 
€ 
(4) 


pV) = Bon 


where the detector output (in volts) is of the form 


(5) 


2 
V=ary, 
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r the envelope voltage, and a, a constant. 

The convolution of this function K-fold times (assuming 
the noise samples added are independent quantities) 
gives the desired distribution at the output of the in- 
tegrator™ 


Done 
PAX) = (Kom (6) 
with 
ee V 
aka 2a.Wo 
P.,, 18 thus given by 
(os) Gan wee 
Re. Woe say da (1a) 
with da = ee , a normalized 


threshold level. 

This integral has been tabulated by K. Pearson as the 
“Tncomplete Gamma-Function.’’* 

The determination of p,(V), the signal plus noise 
distribution at the output of the integrator, depends 
upon both signal and noise characteristics, and, in par-~ 
ticular, upon the statistical properties of the signal. 

The experimental data available on this latter point 
indicate that the reflected energy from almost all large 
complex targets is given by a proabability distribution 
of the form*’” 


=S/is) 
e S 
S 


with S the instantaneous reflected power and S the 
average reflected power. (S depends upon the particular 
target involved.) This is a fortuitous result, since the 
distribution is of the same form as (4). The signal voltage 
amplitude before detection can thus be assumed to have 
the same Gaussian distribution as the noise, although of 
different mean-squared value. This distribution has also 
been obtained theoretically as the probability distribution 
of echoes from raindrops, sea clutter, etc.*, and will be 
accepted as the distribution to be appliea in most of the 
following analysis. 


(7) 


UNCORRELATED SIGNAL FLUCTUATIONS 


Where the signal plus noise samples may be assumed 
uncorrelated from pulse to pulse, the voltage at the input 
to the detector will be the sum of two normally distributed 
(Gaussian) quantities—signal plus noise—or itself a 
normally distributed function with zero mean value and 
a variance given by the sum of the two variances: 


¢ = Wo + o, = ¥(1 + He (8) 


4K. Pearson, ‘Tables of Incomplete Gamma-Function,’’ HM 
Stationery Office, London, Eng.; 1922. 
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Here yp is the mean-squared noise voltage, o, is the 
mean-squared signal voltage (proportional to S), and 
s is the mean signal-to-noise ratio (power) at the detector 
input. 

The analysis of the integration of AK signal voltages at 
the detector output is thus identical to the procedure for 
the integration of K noise pulses, and the probability of 
signal detection may be written 


e) K-11 -x 
oe i eh ARIE! 
" eae al) bon 
l 
qs = . Data de Bae 
20.0 l+s 


Eqs. (la) and (9) serve to relate the two defined prob- 
abilities, P,, and P,,, to the mean signal-to-noise ratio, 
s*, and the number, K, of pulses added. Thus, with P.,,, 
specified, the threshold level (normalized to system noise 
power) is determined from (la). Eq. (9) then gives P,, 
as a function of s° and K. 

The results for this case of uncorrelated fluctuating 
signal are plotted, for P,,, = 10°*° and square-law detector, 
in Fig. 1. They are there compared with the results of 
a constant-amplitude signal analysis’, the mean signal- 
to-noise ratio in the fluctuating case corresponding to 
the constant S/N ratio in the equal-amplitude case. 

These curves indicate that the performance of the 
system is not impaired very much by fluctuation of 
successive signal echoes so long as K is great enough. 
For should one or two pulses reflected back be of low 
power the chances are that successive pulses will make 
up for this and still enable the signal to be detected. It 
is only for K a small number (less than 5) that the effect 
of fluctuation has, on the average, a deteriorating effect 
on performance. As P,, increases from 50 per cent to 
90 per cent the relative loss in performance due to fluctua- 
tion (or the additional signal power needed to maintain 
performance the same) increases. Thus, for K = 2, the 
fluctuating signal return requires about 0.8 db additional 
signal power to maintain the average performance at 
P,, = 50 per cent. For P,, = 90 per cent, this additional 
power required at K = 2 is 4.5 db. 


(9) 


(10) 


COMPLETELY CORRELATED SIGNAL FLUCTUATIONS 


For the case of completely correlated fluctuating 
signal, the exact signal to be received is unknown, being 
given only by a probability distribution, but once it is 
received it remains fixed over all K pulses. This can then 
be solved using the results cited for constant-amplitude 
signals.’ 

With P,,, and K fixed, P,, is a function of the power 
signal-to-noise ratio. This ratio in turn is directly pro- 
portional to the received signal power which is distributed 
according to a specified probability distribution, and which 
may take on (theoretically) all values from 0 to ©. On 
the average, then, the probability of detecting a signal 
from a slowly-moving airplane will be given by 
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Pa = | PaW@ ds (11) 
0 

where P,, is the probability of detecting a given fixed 

signal S, and W(S)dS is the probability of receiving this 

particular signal S (power). 

Using the results of a constant signal analysis, P,, 
may be plotted vs S (given P,,, and K) and the resultant 
curves approximated by simple functions. From these 
and a specified W(S), then, the indicated integration of 
(11) may be performed and P,, plotted vs S/N. This 
procedure has been carried out for the aforementioned 
distribution 


. eo 8/8 
WG) s= 7 
(S) 5 (7) 
and also for another, arbitrarily chosen, distribution, 
a iS = S'2//2)S'o2 
W(S) = = (12) 


where S, = SV/2/r and S is the mean value of S, as 
previously. Eq. (12) is seen to be the Rayleigh distri- 
bution for the distribution of noise at the output of a 
linear envelope detector. 
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The results obtained are shown plotted in Figs. 2 and 
; respectively, in which the constant signal (nonfluctuat- 
ng) results are also included. These curves show that the 
ffect of completely-correlated fluctuations is to produce 
, fairly constant loss in performance, the additional 
ignal power required to recoup this performance (2.e., 
ive same P,, and P,,) being very nearly the same for 
ll K. As expected also, the distribution of (7) produces 
, greater fluctuation loss than that which would result 
rom a distribution given by (12): The distribution given 
yy (12) has a much smaller deviation about its mean 
alue and so approaches more nearly the 6-function 
istribution of a constant nonfluctuating signal. 

The losses due to completely-correlated fluctuations 
re much worse than those in the uncorrelated case, as 
hown by comparing Figs. 1 and 2. This is due to. the 
act that once a given signal power has been received, 
omplete correlation requires the successive signals to 
e of the same magnitude in power. A low echo return 
vould thus render the detection of the signal very im- 
robable, while for the uncorrelated case this low echo 
ould conceivably be followed by a much higher one on 
he next pulse. 
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ParTIALLY CORRELATED FLUCTUATING SIGNALS 


Although the calculations for the two extreme cases 
of fluctuating signal just treated show the effect of signal 
fluctuation on system performance in terms of the 
Neyman-Pearson derived probabilities, P,, and P,,, it 
is apparent that in general the actual correlation for 
practical systems will lie somewhere between these two 
extremes. Typical echo data and correlation functions 
derived therefrom are shown in the literature, the corre- 
lation approaching zero rapidly as the time interval 
between the successive voltage is increased.’” 

This analysis will treat only the simplest case of two 
signal (or noise) samples added with the sum required to 
exceed a specified threshold level. Using these results and 
those previously obtained for the completely-correlated 
and uncorrelated cases, conclusions may be drawn as to 
the effect of correlation between successive signal pulses 
on signal detection for K greater than two. 

In this analysis the noise is again assumed uncorre- 
lated, as are signal and noise. The singal voltage at the 


16H. M. James, N. B. Nichols, and R. S. Phillips, “Theory of 
Servomechanisms,” M.I.T. Rad. Lab. Ser., McGraw-Hill Book 
Co., Inc., New York, 'N.Y., vol. 25, ch. 6; 1947. 
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input to the detector is assumed to be Gaussianly-dis- 
tributed with normalized correlation function —1 < p < 1. 
The correlation function, p, is defined as 


(a, — #,)( 4s ne) 


A) Gee 


where x, and x, are the amplitudes of the two successive 
samples, assumed normally distributed, @, and #, the 
mean values (here zero), and oj, o> the variances of 2, 
and x, (mean-squared signal voltage). 

The correlation functions plotted in the literature 
represent correlation in signal power, or mean-squared 
voltage, of successive samples. It can be shown’, however, 
that p,, the power correlation, is given simply by 


ey. (14) 


Passing each pulse through a square-law envelope 
detector and adding two successive pulses with corre- 
lation p, the probability distribution of the sum of these 
two samples turns out to be given by’ 


ae aV 
Tg ee 
pV) 967" — sinh Ee 9 ) 


(the detector constant, a,, has here been set equal to 1 
for simplicity). 
Here 


(13) 


(15) 
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ns Be 
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o = (1 +s’), 


a = 


as previously, and 


Pe 
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p’ can be interpreted as a modified correlation function 
arising from the sum of noise and signal before detection. 
The correlation of the two signal samples is thus de- 
creased due to the (assumed) independent noise fluctua- 
tions. In particular, as the mean signal-to-noise ratio, 
s’, becomes very small the over-all correlation function, 
p’, approaches zero. On the other hand, for large signal-to- 
noise ratio, the signal voltages maintain control and the 
correlation is effectively p, that for the signal alone. 

The Neyman-Pearson derived probability that the sum 
of these partially correlated variables will exceed a 
specified voltage threshold level, b,, is then 


foo) 


pV) dV 


bs 


Pate (16) 


Using the exponential equivalent for the hyperbolic 
sine in (15), (16) may be readily integrated to give 


/ if 
Ps. = cosh | wee - | = : sinh Fea: 
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Here 
q. = go/ +), 
b. 
Qn = 2a,Wo ) 
as previously. As a check, with p = 0, p’ = 0, and 
Pipe = Oe ag), (17) 


in agreement with (9) specialized to K = 2. 

As previously a particular choice of the Neyman- 
Pearson probability P,,,, (the probability of the sum of 
two noise variates in the absence of signal exceeding the 
threshold level b,) fixes q,, and P,,, may thus be calculated 
as a function of signal-to-noise ratio for the specified 
value of P,»,. 

This has been done, using (16a), and the results are 
shown plotted in Fig. 4 (P,,, = 10”) and Fig. 5 (2 
10°) for values of the signal voltage correlation function 
p, from 0 to 1 (complete correlation). As stated previously 
p = p,, the correlation function of the reflected signa 
power. 

These results are as would be logically expected from 
the physics of the problem: For an increasing mean signal. 
to-noise ratio (per pulse) the completely correlated case 
becomes progressively worse (in terms of the system 
performance standards) than the uncorrelated case. The 
results for partial correlation fall between these twc 
extremes. This is so because if by chance the signal powel 
on the first returned pulse should be much lower than the 
mean signal power, the signal power on the next pulse 
will be exactly the same, in the case of p = 1. If the signa 
can change from pulse-to-pulse, however (partial corre: 
lation, p < 1) then there is a chance the power returneé 
on the succeeding pulse will be greater, increasing thé 
signal detectability (P,,,). 

But for small mean signal-to-noise ratio the probability 
distribution is much narrower about the mean value, s¢ 
that for uncorrelated signals the chance of the seconc| 
signal pulse being much higher than the first is not as 
great. Thus the system performances for the correlatec 
and uncorrelated cases approach one another. (7.e., for ¢ 
“narrow” probability distribution the introduction 0: 
correlation does not change matters very much, on thé 
average.) With the mean S/N small enough (on Poe 
10°"°, S/N < 10 db), the performance under correlatec’ 
conditions becomes in fact slightly better than witl 
uncorrelated signals. { 

It is also noteworthy to remark that for P,,, < 40 pe 
cent (roughly—the actual value depends upon P,,,,) the 
fluctuating signal cases become “better” than the constan: 
signal case. This is due to the broader probability distri 
bution of the signal plus noise under fluctuating conditions 
For small mean S/N, higher voltages have a greate 
chance of occurring in the fluctuating case than in thi 
constant signal case. 
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These results for partial correlation of two successive 
gnals may also be profitably compared and checked 
ith the results determined previously and independently 
wr the two extreme cases of p = 0 and p = | (Figs. 1 and 
). Thus, the results of Figs. 1 and 2 for the particular 
voice of K = 2 agree fairly well with those of Fig. 4 
ame signal-to-noise ratio for specified P,,,). This check 
particularly satisfying for the case of completely- 
yrrelated signals (p = 1) since this case was previously 
1alyzed using a different approach than that followed 
are [see (11)] and the results were obtained by rough 
yproximation procedures. 
The method of analysis used here could presumably be 
‘tended to the general case of partially correlated 
ictuating K pulses added together after detection, but 
ie mathematics involved would be extremely laborious. 
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However, since the results for the partially correlated 
case fall between the two extremes of p = 0 and p = 1, 
for which the results have been plotted in Figs. 1, 2, and 
3, these latter curves may be considered sufficient as 
bounds on the actual effect of fluctuation in practical 
systems. 

A comparison of the two sets of curves, Figs. 4 and 5, 
indicates that the change in P,,, (or threshold level) over 
the range 10°° to 10°*° (at least) does not affect the 
conclusions discussed above as to differences between the 
correlated and uncorrelated cases. This is true too for the 
previous results obtained for the two extreme cases for 
all K. The effect of increasing correlation (requiring 
progressively more power to give the same P,, and P,,, 
for P,, > 40 per cent) is thus very nearly independent 
Ores 
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Solution of an Integral Equation Occurring in the 
Theories of Prediction and Detection 


K. 8. MILLER} ann L. A. ZADEHT 


Summary—In many of the theories of prediction and detection 
developed during the past decade, one encounters linear integral 
equations which can be subsumed under the general form 
fat R(t, r) x(r) dr = f(t),a < t S b. This equation includes as 
special cases the Wiener-Hopf equation and the modified Wiener- 
Hopf equation /o7 R(| tf — 7 |) x(r) dr =f(t),0 St ST. 

The type of kernel considered in this note occurs when the 
noise can be regarded as the result of operating on white noise 
with a succession of not necessarily time-invariant linear differ- 
ential and inverse-differential operators. For this type of noise, 
which is essentially a generalization of the stationary noise with 
a rational spectral density function, it is shown that the solution 
of the integral equation can be expressed in terms of solution of a 
certain linear differential equation with variable coefficients. 


INTRODUCTION 


N MANY of the theories of prediction and detection 
] developed during the past decade, one encounters 

linear integral equations which can be subsumed 
under the general form 


yi) =f RU, Dx) dr, 


where R(t, 7) denotes the auto-correlation function of a 
random, not necessarily stationary, process; x(7) is the 
impulsive response of a filter (for fixed ¢); and a and b are 
constants, with the range of integration extending from 
a— to 6+ in order to contain any delta function terms 
which may be present at the end points a and b. 

Typical special cases of (1) are: The Wiener-Hopf 
equation’ ° 


yi) = [RU = vale) dr, 


(2) 
O75 tes or 
and the modified Wiener-Hopf equation 
Tt 
va Ri sees s 
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which is encountered in prediction and smoothing,*— 
detection,” '* maximum likelihood estimation,’* ‘° anc 
in the representation of stationary processes."? '°''’ The 
modified Wiener-Hopf equation can be solved explic: 
itly*’*'’* when the spectral density function corresponding 
to R(r) is rational in w, and in a few other special case: 
including that of band-limited white noise."” The homo: 
geneous form of (8) has been treated by De Sobrino, 
Muller,’® Slepian,’’ and, more recently and very compre 
hensively by Youla.’* 

Kernels of the more general form R(t, 7) # RUE — + 
occur when the underlying random process is nonstation: 
ary.’’ *” In such cases it is not possible, in general, t¢ 
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solve (1) explicitly. There are, however, certain special 
cases in which explicit solutions of (1) can be obtained. 
Dolph and Woodbury, in particular, have solved (1) for 
the case where R(t, 7) is the Green’s function of a known 
differential operator. 

A case of greater practical interest which is considered 
in this note is that of a nonstationary process which is 
generated by successive operations on white noise with 
differential and inverse-differential operators. Such a 
process is essentially the nonstationary analog of a 
stationary process which has a rational spectral density 
function. As is well-known, the latter type of process can 
always be generated by operating on white noise with the 
product of two time-invariant differential and inverse- 
differential operators.” 

The approach described in the sequel constitutes in 
effect an extension of the spectrum shaping technique*"”* 
to the nonstationary case. The use of this technique in 
the nonstationary case does not yield an explicit solution 
to (1) as it does in the stationary case.* Rather, it reduces 
the determination of x(7) to the solution of a differential 
equation with variable coefficients. From the practical 
viewpoint, such a reduction is of advantage when one has 
access to an analog computer. For with such computers, 
it is generally much easier to solve a differential equation 
with variable coefficients than an integral equation of the 
form of (1). 


NoraTIoNs AND ASSUMPTIONS 


To begin with, we note that it is sufficient to solve (1) 
for the special case where the left-hand member is of the 
form 6(¢ — &), that is, 


Nera aye pe AG RHO OS® 


If the ‘solution for this case is denoted by R ‘(r, £), then 
by superposition the solution of (1) may be written as 


(a) =f RG, Dy a (5) 


The assumption that R(f, 7) is the auto-correlation 
function of a process which is generated by acting on white 
noise with a succession of differential and inverse-differ- 
ential operators, implies that R(¢, 7) is the composition of 
a kernel I'(¢, \) with itself, that is, 


R(t, ) = fT HTC,» a, (6) 


where I'(¢, \) is the result of operating on a delta function 
6(t — \) with a succession of differential and inverse- 
differential operators, 


cee VIN GS Peo SY CcZ.0 (toa), (7) 


where L, M, ---, Z are specified linear differential opera- 
tors with variable coefficients. 


23Tt is tacitly assumed here that only the spectral distribution 
of the process is of interest. 

24H. W. Bode and C. E. Shannon, ‘‘A simplified derivation of 
linear least square smoothing and prediction theory,” Proc. IRh, 
vol. 38, pp. 417-425; April, 1950. 
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For simplicity we shall assume that there are only two 
operators. The same techniques may be utilized in the 
case of n operators. Thus 


T(t, A) = LD 'MS&(t — d). (8) 


This case is sufficiently general for most practical purposes 
and it includes as special cases those treated in the 
literature.”° 

The operators L and M are assumed to be of orders 1 
and m respectively. The formal adjoint of L is denoted 
by L* and its impulsive response is denoted by L~'(é, &). 
Thus 


Lyle (58) =10(0 2), = Lone aan) 


where the subscript ¢ is used to identify the variable on 
which L operates. In terms of L’‘(t, &), an operational 
expression such as L g(t) may be written in the explicit 
form 


Lg) = | LU, 89 ae. (10) 
SOLUTION OF THE EQUATION 


As a preliminary to the solution of (4) we shall establish 
the following relation: 


ERG) = 2 MS Ge (11) 


where M* is the Lagrange adjoint of M. To obtain (11) 
we make use of (10) to write (8) in the explicit form 


rt,» =f E-, uMyae — Jae. (12) 
Substitution of this expression in (6) yields 
rid =f ff oG, ome — NG, 9 
[Mpa — )]}dEdg aX (13) 


and operating on both sides of (13) with L results in 


LRG =f ff WL vitae — MEG, D 


[M,6(¢ — N]dedg a. (14) 
Since 
DDE: £) a a(t ae é), (15) 
(14) reduces to 
LRi)=f f f[ e-OMee- NEG, 9 
-M,6(¢ — d) dé dg dd 
=f f Mae-vG,d 
-M,6(¢ — d) de dn. (16) 


25 The assumption that I'(¢, \) = LM 6(¢ — X) leads to similar 
results. 
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Now it can readily be shown that, for any reasonably 
well-behaved function F(z, d), 


[ FG, plwae — dg = MFG) 7) 
and 
[ Rr, WIM AG = dk = IEF) ane) 
Consequently (16) may be written as 
L,R(t, 2) = (fe M,a(t — )MAL(r,») dd (19) 
or equivalently, 
L.R(t, 7) = M.M*L“"(7, 0), (20) 


which is the desired result. 
Our objective is now to obtain an expression for the 
inverse kernel R™'(r, £) which is the solution of (4), that is, 


b+ 


/[ RG DRe Gide — 06 1b) aes Oe 


Operating on the left-hand member of (21) with L, yields, 
by virtue of (20), 


ie IDE. CRG 


b+ 
ms / M,M*L-\r, )R7(r, 8) dr 


b+ 
MM | LCR Ge Orde 


MGM AL Sees tae ye (22) 


For the right-hand member of (21) we get 
Hohe ae)iet ae ALO ea) ee Bos O23) 


where the linear combination of delta functions of various 
orders arises as a result of possible discontinuities in the 
right-hand member at the end points ¢ = a and ¢ = b. 
The coefficients A4.(é) and B,(&) as well as the range of 
a are undertermined at this stage. 

On combining (22) and (23) we obtain 


M,MAL* R (é, ) = £5 — £) 


4 Ago G0) Dy Bao Ob) and) 
which on letting 
W(t, ) = L¥'R“(t, &) (25) 
and 
A(t, = >> A,d?(t — a) + DBS '(t — D) (26) 


June 
may be written more compactly as 


M M%yit, £) = L,.6(t — §) + Att, £). (27) 


Regarding this as a differential equation on y(t, &) and 
solving for ¥(¢, £) we have, formally, 
W(t, £) = (M,.M*)"[L.d(t — &) + At, 8] 


a » C'sba(t) (28) 


where the ¢g are linearly independent solutions of MM*¢ = 
0 and the C;() are constants with respect to ¢. Thus, 
operating on w(t, &) with L* yields 


R(t, 8 = LAM*'M,'L, 6(t — &) 


+ L*M*#"M, A(t, £) + Do CeL*oe(t). . (29) 
B 


Now A(t, £) contains delta functions of order at most 
1 — 1. Consequently the second term in (29) contains 
delta functions of order at most 2/ — 2m — 1. Thus the 
solution of (4) is given by 


R“(t,8 = IAM* ML, 8 —&) 


2U—2m—-} 


+ >> A,8'(t — a) 
2l—2m—-1 2m 
Fo ie) 00 cone ele > CaL*$,(t) (30) 


and the undertermined coefficients A,, B., Cs may be 
evaluated as in the stationary case* by substituting (30) 
in (4) and treating the resulting equation as an identity. 
In the case of (30), however, the determination of the 
gs(t) and the constants A,(&), B,(é), and Cs(€) would in 
general require the use of an analog computer. We note 
that for the special case of the modified Wiener-Hopf 
equation, (80) reduces to the explicit expression 


= L(—p" 
R=) = Gee at — 
ts DE Aro tbo 20) wet ant Bao" (nD) 


2m 

Se 28 Cre (31) 
where p = d/dt is the Heaviside operator, M(w*)/L(w’) 
is the spectral density function corresponding to the auto- 
correlation function R(r), and the e “’ are the linearly 
independent solutions of the differential equation 
M(—p’) ¢ = 0. This solution of (3) is equivalent to that 
given by Zadeh and Ragazzini.* 
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Aw EXAMPLE 


To illustrate the main points of the methods described 
above it will suffice to consider a simple example which 
can be handled analytically. One such simple and yet 
nontrivial case is when the operators L and M are of the 
first order and have leading coefficients which are linearly 
increasing functions of time. 


Specifically, let L = tp, M = tp + k where p = d/dt 
and k is a positive constant. In this case L* = —tp — 1 
and M* = —tp + k — 1. The following expressions for 


the various quantities entering into (30) are readily 
derived: 


ze ies 
baal Cg 3 Yar UC are 3 es 

a careers 

wae =1¢-94(2) 
TGA ea eee : We) (33) 

R(t, 7) = at — 7) + 1(t — pee 

+1r- 4 a Ea wed) 
aQ=t", g() =e. (35) 


In the above expressions 1(f) is the unit step function. 
With a knowledge of L, L*, M~‘(t, £) and M*~‘(é, &) it 
is a simple matter to calculate the leading term in (30). 
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Thus 
he kl fy 
RU, = b- B+ raat el 
—k1/(t\ 
qe GS nt or 5 (5) 


+ Ca Cee (36) 


and the delta function terms are absent by virtue of 
21 — 2m — 1 being negative. On substituting this ex- 
pression in (21) and performing the necessary integrations, 
one obtains two equations on C, and C,. Thus C, and Cy 
can be determined. They are 


Pages Cason o£) echpee: = | 

oe 7 
FHT ao 

Cie kes) lie ‘as (k= (8) 


% — | ae e(2) a 
b 


vo 


Substitution of these in (36) yields an explicit expression 
for the inverse kernel R~'(E, &). 

In terms of this kernel the solution to (1) with an 
arbitrary left-hand member y(é) is given by 


oi) = [| RG, Bul a, 
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Generalization of the Class of Nonrandom Inputs 


of the Zadeh-Ragazzini Prediction Model’ 


MARVIN BLUMT 


Summary—tThe prediction theory presented in this paper is an 
extension of the prediction theory of Zadeh and Ragazzini'. It 
differs from their theory in that the nonrandom component of the 
input signal in the Zadeh-Ragazzini model is restricted to a poly- 
nomial of known degree n. In the theory developed here, the 
nonrandom component of the input signal may be any arbitrary 
linear function of a subset of known analytic functions where the 
subset of functions are known a priori but the linear relationship 
need not be. As in the previous solution, the determination of the 
impulsive admittance of the optimum predictor reduces to the 
solution of a modified Wiener-Hopf integral equation. 


THEORETICAL Di1scussION 


HE impulsive admittance can be determined 
al implicitly for a certain class of stationary signal 

and noise. The modifications of the previous 
solution for the more general nonrandom input functions 
are given. The usefulness of the theory is demonstrated 
for an input consisting of the general nonrandom com- 
ponent plus white noise. 

Zadeh and Ragazzini considered the following problem: 
Given an input to a linear filter composed of a stationary 
component M(t) plus a polynomial P(t) of known degree 
n and a stationary noise component N(¢), determine the 
linear filter which will produce an optimum prediction of 
some linear function of the signal component. 

The optimization consists of selecting that linear filter 
whose output is closest to a desired output in the sense that 
the ensemble mean error of prediction is zero and the 
square error of prediction is a minimum. 

This paper considers the same type of model of the 
input except that the nonrandom component is not 
restricted to a polynomial of degree n, but can be any 
linear function of a subset of n + 1 functions, analytic in 
the region 0 S$ ¢ S T + | a |, where the n + 1 functions 
are known a priori, but the linear combination need not 
be known. The quantity 7 is the finite memory of the 
filter and a is the prediction time. 

Thus, the nonrandom component is given by 


P(t) = yS afP,(t) (1) 


where the P,.(f) are known but the vector aj need not be 
known. The impulsive admittance of the optimum linear 
filter in the least square sense is found for this more 
general input model. 


* This report was prepared to document the results of a theo- 
retical investigation into the filter problem of nonrandom inputs. 
It is an extension of a partial solution made previously by L. A. 
Zadeh and J. R. Ragazzini. 

+ Convair, San Diego, Calif. 

1. A. Zadeh and J. R. Ragazzini, ‘An extension of Wiener’s 
theory of prediction,” J. Appl. Phys., vol. 21, pp. 645-655; July, 
1950. 


The solution differs with respect to the previous solution 
in that the minimization is taken with respect to one 
point, only. The point selected is noted by t = 7) where 
Oss ee 

Since the method of solution for the generalized model 
is essentially parallel to the solution obtained by Zadeh 
and Ragazzini, only the results are presented here. For 
a detailed solution see their original paper. 

Let the input to the filter be given by e,(¢) where 


e(t) = S() + NU). (2) 


The function S(é) is the signal and N(é) is a stationary 
random disturbance. The signal has two components: 


SO = PO + MO, (3) 
where the nonrandom component of the signal is 
PO => APO, (4) 
k=0 


and M(t) is a stationary random component. 

Let S*(t) be the desired output and consider an ideal 
predictor whose impulsive admittance is given by k(f). 
Then the relationship between S*(f) and S(¢) can be 
represented by 


S*(t) = ip : LCS eae (5) 
If K(p) is the system response of k(r), then 
S*@) = K(p)S@). (6) 


Let H(p) and W(t) be the system response and impulsive 
admittance, respectively, of the actual predictor and let 
e,(t) be the output of the actual predictor. Then 


Be iP Wire (hen ene 


i Woes (7) 


since W(r) is taken as equal to zero when + > T. The 
error of prediction is given by 
e(t) = e(t) — S*(d). (8) 


The optimum filter is obtained by finding the function 


W(z) such that 
(jav = 0 “Or (ee = 48-0) (9) 


where ( )ay represents the ensemble average and (e),, 
is a Minimum. 


| 
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It is assumed that 5 5 : i 
Oo = (ee =" baa ee é di. (16) 
(M(t) = (Nav = 0 (10) a ; 
or all ¢ so that combining (2), (3), (4), (5), and (7) gives rete 
L 
T n ‘ 
we = f LeaiPat-)W)dr aD) serdar el NE, 
— 0 
LF and 
L 
(S*O)e =f M) DeatPe— dr. (12) $= Wa NO GS 
—0o k=0 


Eq. (9) can now be written as 


aif [Pac — 1)W(r) dr — i. k(r)P,(t — 7) ar| ye 
(13) 


Since (13) must hold for any arbitrary set of constants 
4, 1t follows that when t = 7 then 


i 2, Shiner: 


=f KPT, — dr = OAT), (14) 
vhere k = 0, 1, 2, --- n. 

Note that (14) imposes a system of n + 1 constraints 
nm the impulsive admittance W(r). For example, if 


PO =o = alt 


he solution would involve only two constraints because 
here are only two unknown constants a; and a. There 
re not four constraints because the input contains a 
ubic component of time. 

The functions 


PO) =a? +r) 
nd 
P@) = a + 2%) 


vill each lead to a system containing one constraint. The 
equired filter will be different for each function. Thus the 
ptimum filter design makes use of all the a priori knowl- 
dge about the input function to reduce the mean square 
rror of prediction by reducing the number of constraint 
quations of the form of (14). This error reduction is 
ccomplished at the expense of making the filter more 
pecialized since the filter will be optimum only for the 
arrower class of input functions. 

By combining (2), (8), (5), and (7), the prediction error 
an be written 


= [ wee — 7) + M(t — 7)] dr 


) 


i ie Re > pide 5) 


The mean square value of ¢ is given by 


2The assumed equivalence of the time average and ensemble 
nplies that both M(t) and N(¢) are ergodic as well as stationary 
rocesses. 


It will also be assumed that there is no correlation between 
signal and noise: 


L 


Lim 1/L NOM (G7) dt 20. 


Law 


(19) 
Substituting (15), (17), (18), and (19) into (16) gives 


caf [ WO)WGlduln = 2) + 4x(n — ml dn dre 
= 2 tle / W(1) kr) oul _— To) dr, dr. 


+ ff k)ir)bulrr — 2) dri dre. (20) 

The solution for W(r) requires that the o° of (20) be 
minimized subject to the n + 1 constraints of (14). This 
is a standard isoperimetric problem of the calculus of 
variations. One is led to minimize 


if = o a 2 De Quds (21) 


where the d, are Lagrangian multipliers. By substituting 
(14) and (20), (21) can be written 


ge se Wn) dr, 
{f W(t) lou(rn er T2) + dy(r1 as T2) | dr. (22) 


~ a K(r2)oult1 — T2) dt. — 2 Sy N,Pi(T > — ry}. 


The last term of (20) has been omitted since it is inde- 
pendent of W(r). 

Setting the variation of J = 0, it is found that the 
minimum o” is obtained when W(r) satisfies the integral 
equation 


[ Wr) [but — 7) + dy(t — 7)) dr 


2: > NEO it . k(nou(t — 1) dr, (23) 


where.0e= (=. 0s 


3In Appendix II, a modification of (23) as derived by the referee 
is presented for the more general case when the minimization is 
with respect to the whole time interval instead of Just the point 
t — Mey 
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For a prediction of the Lth derivative of the input 
function at? = T) — a, 

k(t) = 6(7 — @) (24) 
where 6”(u) is the Lth derivative of the Dirac function 
6(w). 

Applying (24) to (14) yields 
Q. = a PAT | SPOT) 
( T= a 
e 
= W(r)P.(T> — 1) dr (25) 


0 


as the set of constraint relationships for the prediction 
of the Lth derivative of the input message at time ¢ = 
T) —a@ 

In general, (23) cannot be solved explicitly in closed 
form. However, for a certain class of functions [¢4(4) + 
¢y(t)], a solution for W(r) is available. The power spectra 
associated with ¢,,(7) and ¢y(7) are given by 


Sus’) = [dale dr (26) 

on 
So) =f astne™ (27) 

Let 
Sw’) = G(jo)G(—jo) = Sulw’) + Sy") (28) 


such that G(jw) and 1/G(jw) are analytic in the right half 
of the jw plane. Let S(w’) be further restricted by the 


relation 
2» _ Aw) 

S@) = Blo) (29) 
where A(w’) and B(w’) are polynomials in w’. 

S@’) = | Gp) |’ (30) 
where p = jw, and 

as Q(p) ae Gy Ope oe ee y ou 
Cire R(p) Ogre: Oipesi ts > sat: bp’ 3D) 


Then it can be shown that: 
WW) = fw) — we — mS ni oe PTs — 9] 


Sul) K (jo) 
A(w”) 


2m l-m l—-m 
- 5 ae rede, Cr! “(i) Ee a Dy on Caen 


+O+¢R@ [4 K jad jadel®* (82) 


Cyu(t) arises from the operation of R(p) on the discon- 
continuities of W(t) at the origin. In (33) the constant 
is included in the term A,u(t). In the more general form, 
using (34), the solution may not contain a constant and 
must be stated explicitly. If (34) contains a constant, the 
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C) may be merged with it to form only one unknowr 
constant. 

For one class of functions the solutions to (34) are par: 
ticularly simple. This class of functions G is defined by 
the properties: G is a linear vector space with basis P,.(é 
kK = 0, 1, 2, --- n; G maps onto itself by any linear 
differential operator 


= d’ 
Let P(t) = >oi-o a,P,(t) where the a, are any set ol 
complex numbers not all equal to zero; then P(t) 
provided* DP(t) «G i.e. 


m™m 


n d’ 
DS, Ds bay dt Pit) = 
0 


7=0 k= 


MS axP (0) 
k=0 

where any two of the sets b;, a, or aj can be choser 
arbitrarily. | 

It is shown in the appendix that 

1) The necessary and sufficient condition that P(é) eC 
is that d/dt P,(t)«G, (where k = 0, 1, --- n). 

2) The functions P,(t) must be ORR et ces solu- 
tions of a linear differential equation with constant 
coefficients if d/dt P,,(t)«G. 

The a, a, +++, @, are the roots of the characteristic 
A(—p’) = 0, and u(t) and u(t — 7) are unit step functions 
which cause W(t) to be zero outside the region where 
Onsale 

To determine the solution for W(t) it is required te 
solve the differential equations 


R(p)" 
Apo) 


The function R(p)’/A(—p’) is a system response 
relating /,(¢) as the output to P,(7) — ¢) as input. 

The solution for W(¢) obtained from Zadeh and Raga- 
zzini' is 


M,(t P,(T,) — t) be! 0) 15°25 eee 


W(t) = fu) — ul — mH Ant, 
1 apy ft? Sue KGOR(— we 
a On R(p) Jes A(w’) dw 


2m l-—m 
=i 5 Be} ot 2AG.d ae) 


l-m 
meh a 1). (33) 


The two solutions, (32) and (33), differ in two respects. 
The term 


Ee | 
yy Mey p’) 


is substituted for )7t-5 fe and the constant Co is added. 
The complete class of functions @ satisfying the con- 


IO 3) (34) 


{ 


* For the symbol e, read “belongs to.”’ 
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tion DGeG are the. set of functions P,(¢) which are 
ossible solutions of an arbitrary homogeneous linear 
fferential equation with constant coefficients. This class 
composed of arbitrary polynomials, sums of exponentials, 
ims of sines and cosines, and products of the above 
inctions such as to be a term in the solution of a homo- 
sneous differential equation with constant coefficients. 
For the class of functions in G 


n R 2 n 
Sy Palle — = EMPL 9. BD 
Application of solution to white noise: 

Let M(t) = 0, k(r) not specified 
Y(t) = 4(0), (36) 


nd 
= ip Z Pate 1) de. 
Beats becomes 
i “TG nee ee > MuPAT>s -). (37) 
Ebatituting (G6) into (37) vives 
Wii) = D MPATo =F, (38) 


ubstituting in (14) the value of W(r) obtained from 
38) gives 


GH n 
, = if De, PAT 7 )P (To at t) dt 
0 k=0 


ntroducing the notation 
Tr 
Su =f Plo - OPATs —Ddt= Su, (39) 
0 
hen 


), = NoSoz == MS aa oe Anni 
= S10Ao ip Suri aes SinAn 


OD 2 Rozen? 
(40) 
r in matrix notation 
Q@ = Sd 


) and \ are column vectors and S is ann X n matrix. 


vhere 
Qo i Sour Sax Son 
) 2 OF n = My S = Sio sir San 
Q, rn Sno Syn 


The necessary and sufficient condition that the deter- 
uinant S be nonsingular in the interval 0 S$ ¢ S T is 
hat the functions P,(7, — 4) (k = 0, 1, 2, --- mn) be 
nearly independent. Thus it is required that no set of 
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constants HL; exist such that 


P(To a t) aa » EP (Lo a t) 
ink 
where P,(f, — 7) is not identically zero in the interval. 
The necessary and sufficient condition that the set of 
functions P,(7') — ¢) be linearly dependent in the interval 
is that the Wronskian of the system of functions P, (7, — 2), 
--+ P,(T’) — t) be identically zero in the interval provided 


the Wronskian of system of functions P,(T, — ?), --- 
P,1(7) — t) does not vanish identically.’ 
The Wronskian is defined by the determinant 
Rak a t) alle aa t) PALS — t) 
WaPo — 9 PPT =) = Pee 
Pe (lop ant) rime (lone) ee (ky == 1) 
in the interval 0 S ¢ S T, where 
PY(T, —-d)= © pr — t) 
7 0 7 dt" 7 0 0 


If the functions P,(7T, — ¢t) are found to be linearly 
dependent, then the system of equations has fewer than 
n + 1 constraints. The linear dependence is eliminated 
by defining a new set of functions such that the system 
of constraints will be reduced to a minimum number and 
the corresponding reduced matrix S will be nonsingular. 

Let S~* be the inverse S matrix and assume that S is 
nonsingular. Then, 


S= Qo 
From (38) we can write 
W@). =P” 


where P’ is the transpose of P and is given by the row 
vector 


[P(To = )P\(T = t) lee Bal fam t)]. 
Therefore 
W@) =P'S-Q 


which is-a linear function of the input function P,(é). 
The mean square error of estimate associated with the 
optimum filter W(7) is given by 


7 = Q'S"0 
where Q’ is the transpose of Q. Finally, if 


(41) 


(42) 


the functions 


P,(T — t) are orthogonal over the region 0 S ¢ S T, z.e., 
. Plo — O)Pi@o — 0 dt = oy | (43) 

then S~* is an identity matrix so that 
Ee (44) 


5R. Courant, ‘Differential and Integral Calculus,” Nordeman 
Publishing Co., New York, N.Y., vol. II; 1986. 


APPENDIX I 


Let 


Pt) = > a,P,(t) (1) 
k=0 

where the a, are any arbitrary complex numbers not all 

equal to zero. Let a class of functions G be defined having 

the following property: P(t)«G@ if DP(t)«G. The operator 

D is a general linear differential equation operator on P(t), 


(2) 


where 6; is a set of arbitrary imaginary numbers not all 
equal to zero. Then the mapping property has the form 


vba GP, 


j=0 k=0 


= 3 ajP,(1) (3) 
where any two of the sets b;, a, or a, can be chosen 
arbitrarily. The problem is to determine the members 
of the class of G which satisfy the property D@eG. The 
following lemmas will be required. 


Lemma 1 


d d’ 
If ae HOG then ap P KhOG 


where j = 0, 1, 2, --- 
Proof: Let 


mand k = 0, 1, 2, --- n. 


d n 
PA) = DY) duPd) 
u=0 
where b,, 1s an arbitrary set of constants not all equal to 
zero and k = 0, 1, 2, --- n 
Set d’/dt’ P,(t) = P,’. Then in matrix notation, we can 


write 

[ease boo bos Doni atl 

Py ie 10 Bes 

1 | beg 

1s Bro Ont Papi: 

or 
PaO. (4) 
Taking the derivative of both sides of (4) gives 

Ee = (DE eam emeneny 28" e(b) nye ale (5) 


Repeating the process gives the recursion formula 


yea | = Coy (2 y (6) 


Since the matrix (b)“*” is a linear transformation on 
P |, each of the terms d’/dt’ P,(t)G. 


1) 
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Lemma 2 


The necessary and sufficient condition that P(é)«G 
is that 


Lao er where ki= 071-25 
dt 
Proof: 
From (1) and (2) we get 
m n d' r 
DP(t).= D7) Do bie Ga Px). (7) 
j=0 k=0 “dt | 
Expanding the right side of (7) gives 
- - d 
DP(t) = bo SS a,P (2) + b, SS Ay see 
k=0 k=0 dt 
ai Sy bs biQk 5,7 £ | Pd. (8) 


7=2 


The first term of (8) obviously belongs to G since AG ¢ G@ 
where A is an arbitrary constant, and the second term, 
belongs in G by the hypothesis. In the third term we can 
substitute >"-9 wj;,P.(t) for d’/dt’ P,(t) from (4) of 
Lemma 1, where u,;, are arbitrary constants. Then the 
third term also belongs to G since it reduces to a linear 
sum of d/di P,(¢) where k = 0, 1, 2, --- n 
Since 
SP BGG (9) 
7=0 
where B; is a set of arbitrary constants and G,; is any 
particular member of G, it follows that DP(t) eG whe 
given d/dt P,(t)«G. 
To show that the condition is necessary, let 


a 


=e Chal ht) OT) (10) 
u=0 y 
where Q,,(t) eG G is the group of all functions not in G). 
Then if Q;,(é)eG, there exists no linear transformation 
of the form | 
tl 
Q(t) = Ds Cinvl(t) (1 

v=0 y 


for all 7 and k. Substituting (10) into (7) gives 


» oP) OD s a,b ;Q;x(t) 


k=0 7=0 


DPO = (12)' 


where the relationship must hold over all arbitrary 
constants a, and b; (a, and b; not all equal to zero). Since 
{ 


y ss anb;Qix(t)G, 


7=0 k=0 


then DP(t)eG. 
However, DP(t) eG if and only if 

2 Ss 4,0;Q;.(t) = 0. 

=0 7=0 


Since a, and 6; are arbitrary sets of constants, 


SS ne 


7=0 k=0 


sand only if Q;,(¢) = 0 over all 7 and k. In particular for 
= 1, Q:.(t) = 0 so that the condition d/dt P,(t)«G@ is 
cessary and sufficient. 

Since the condition 


P™() = (13) 


necessary and sufficient, the solutions of the system of 
ifferential equations of (13) defines the class of functions 


bP 


r 
The solution to (13) is obtained thus:° 
| Let 


/ Ca P,(t) 
nd 


d“ 
di P,(t) = p’y, 


Then (11) can be written as the matrix equation 


ply = by (14) 


here p is a scalar multiplier, J is an n + 1 by n + 1 
lentity matrix, and y is a column matrix. Eq. (14) can 
e written (pf — b) y = 0. Let F(p) be the determinant 
pl — b), and the solution to the system is given by 


F(p)y; = 0, a = 0, i 2, 


rovided F(p) is not identically zero. Since F(p) is a 
olynomial in p, the y; = P;(t) are the complementary 
olutions.of linear differential equations with constant 
oefficients. 

Thus the complete class of functions P(t)eG is the 
undamental system 


PW) = DY aPld) 
vhere 
F(p)P.(t) = 0 k= 


The complete class of functions that belong to G con- 
ists of arbitrary polynomials, sums of exponentials, 
ums of sines and cosines and products of these functions 
yhich could be terms in the solution of a homogeneous 
near differential equation with constant coefficients. 
‘or example, the function P(t) = > oxo a:t’ is a solution 
o the differential equation 


ar sett 
art! 


0, Ih 2, 


P(t) = 0 


ig 
P(t) = Di ae™ 
k=0 


; the solution of a linear constant coefficient differential 
perator D,.,, ot order n + 1 operating on P(¢). 2.e., 
),.,P(t) = 0 where the roots of the characteristic equa- 
ion Daii(p) = 0 are do, dy, do, --- d,, and the roots are 
ll distinct. 


6W. L. Ince, “Ordinary Differential Equations,’ Dover Publi- 
itions, New York, N.Y.; 1944. 
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APPENDIX II 


Modification of Solution for W(t) When the Minimization 
ts with Respect to the Entire Time Interval’ 


The quantities Q, defined by (14) are constants since 
the error is minimized at a particular instant of time. 
In a more general situation where the error is to be 
minimized for all ¢ the Q, are functions of time, Q, = Q,(t). 
Thus — the Lagrangian multiplers become functions of 
time, and the integral (23) assumes the form. 


[ WG loult =e pane ae 


= > ie N(7)P(r — t) dr 


Hie 


The solution of this integral equation, together with the 
constraint (14) (with ¢ substituted for 75) yields W(é) 
and the Lagrangian multipliers \,(f), A2(d), eG): 
The solution for this case is more difficult than that of 
solving the single integral (23). 

There is, however, a reasonably wide class of functions 
P,(t) for which the above system reduces to a single 
integral equation. This class, /, comprises those functions 
P,,(t) which have the property that P,(é — 7) admits of 
representation in the form. 


(nomu(t at 1) d Tn 


PG su FOTO! 


f =1 
where the y4,(t), M = 1, 2, m, are linearly independent 
functions. In this case, (14) reduces to the set of equations 


i a(7)W(r) dr = i as(t)k(t) dr = R'y 


where the 4, are constants, and the system of integral 
equations degenerates into the single integral equation 


fe wi 


ou(t — 7) + by(t — 7)] dr 


mee = Nak(t) + ie k(nou(t — 1) dr 


in which the Lagrangian multipliers \‘y, are constants. 
Utilizing the above development and under the same 

assumptions from which (32) is derived one obtains the 

modification of the solution for w(t) by replacing 


yy Ne Ry 


Dp) {P.(To a t)} 


VEO 
x > Nu A(—p’) {agr(t) } . 


=0 M=1 


7The development of the integral equation is due to the referee. 
Note that GeF’. 
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The Correlation Function of a Sine Wave Plus 


Noise after Extreme Clippings 


J. A. McFADDENT 


Summary—This paper presents a simple formula for the corre- 
lation function of an extremely clipped signal when the input is 
Gaussian noise plus a sine wave of small amplitude. 


ONSIDER two signals x, and x, at the respective 
times ¢tand ¢ + +. Let 


x(t) = y(t) + A sin ot, 
n(t + 2) = w(t + 7) + Asinalt + 9), 


where A is a constant. y,(¢) and y.(¢ + 7) are noise fluctua- 
tions, obeying the bivariate normal distribution with 
zero means, equal variances o, and with a (normalized) 
correlation p(r) [t.e., the expected value of the product 
yi(t)yo(t + 7), divided by the variance]. 

By an elementary statistical calculation we can show 
that the (normalized) . correlation between 2,(/) and 


to(¢ + 7) is 


(1) 


p(7) tase or 2) 


where a’ = A’/2o0’, the signal-to-noise power ratio. 
(This calculation involves an ensemble average for the 
noise and a time average over one cycle of the sine wave.) 
In general, R(7) is called the cross-correlation function 
of 2,(¢) and 2.,(t), and p(r) is the cross correlation of 
yi(t) and y.(t). If y,(é) and y,(¢) are identical, then R(z) 
is the autocorrelation function of x,(¢) and p(r) is the 
autocorrelation of y,(¢). 

Now suppose that 2,(¢) and x,(¢) are extremely clipped; 
7.e., let the outputs be &,(¢) and é,(¢), where 


RG) = 


{@) = il when 2,(t) > 0, 

£(0 (0) fe 
= —] when 7z,(i) < 0, 

where 7 = 1, 2. When no sine wave is present, A = 0 and 

R(r) = p(r), and the correlation between £,(¢) and 


é,(f + 7) is given by’ 
r(rt) = (2/r) sin * R(7). (4) 


When no noise is present, ¢ = 0 and R(r) = cos wr; 
in this case (4) is also valid, the function r(r) being a 
saw-tooth wave. This case can be calculated by ele- 
mentary methods or by considering a sine wave as the 
limiting case of extremely narrow-band noise. 

When a’ is not zero or infinite, (4) does not hold. The 
purpose of this note is to derive an approximate correc- 
tion for small signal-to-noise ratios, 7.e., for a” << 1. 


7+ U. S. Naval Ordnance Lab., White Oak, Md. 

1 See, for example, J. L. Lawson and G. E. Uhlenbeck, “‘Threshold 
Signals,” McGraw-Hill Book Co., Inc., New York, N.Y., p. 58; 
1950. 


Davenport,’ following Rice and Middleton, has given 
a solution for 7(r) in the general case. His solution is a | 
double series expansion in terms of p'(r) cos mwr, with” 
coefficients involving confluent hypergeometric functions 
of a?. The present solution, for the restricted case a” < 1,’ 
is a power series in a’, the coefficients being elementary, 
functions of p(r) and wr. This result should be better 
adapted for computation than Davenport’s, especially 
when p(r) is near unity. 

Although this approximation can be derived from: 
Davenport’s double series, the following method is simpler: 
First, for a given value of ¢, we shall calculate P,,, the 
probability that 2,(¢) and x,(¢ + r) are both positive. | 
Second, we shall obtain the average of P,, over one 
complete cycle of wt. Let this average be P,,. Finally’ 
we shall obtain r(r), the correlation between the outputs) 
é,(t) and &(¢ + 7), as a function of P,,. 

First let us calculate P,,. Under the bivariate normal 


distribution, the probability that y; > —z, and ys > —&% 
is® 
ie 
21) 22 Qra°(1 a8 Pane 
_ Ys oF Ys — 2 PY iY : 
ie ie ef Dye a) a dy,. (5) 


By differentiation with respect to z, and 2, we may: 
expand F(z,, 2.) in a double Maclaurin series, 


Fle,2) = FO, 0) + (22) 2+ (%) 2 | 


al 0Z2/ 0,0 


1 ee a oF 
Se pee 
an 22) | 02; 0,0 Zi ame 02; 02/0, ue 


ark \taee | 
ne (25) 4| T ach oo) | 


For example, two of the coefficients are 


FO, 0) ae z ac (1/2m) sin ' P; 


oF o ‘ 
(2 Ea ee 1/[2r0° (1 — 


If we substitute z; = A sin wf and z = A sin w(t 4+ 7), 
then F(z, 22) becomes P.,,, the probability that x,(f) and 
v(t + 7) are both positive at a given time f. Only the 
even terms of (6) need be considered, since the odd ones 
will integrate to zero over a complete cycle of wt. 


p)*). (7) 


2W. B. Davenport, Jr., “Signal-to-noise ratios in bandpass 
limiters,” J. Appl. Phys., vol. 24, p. 720 (13); 1953. 

3H. Cramér, “Mathematical Methods of Statistics, ” Princeton 
Univ. Press, Princeton, N.J., p. 290; 1946. 
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To obtain P.,, we integrate P., -d(ot) from at-= 0.to 
= 2m and divide by 2r. Pit represents the average 
ction of time in which 2,(¢) and 2,(¢ + 7) are simul- 
neously positive. 
Next we must obtain r(7), the correlation between £, (f) 
nd &(¢ + 7). The mean values of both signals are zero. 
rea the two signals can only be +1 or —1; there- 
re the variances are unity and the correlation r(7) is 
mply the expected value H[é,(A)&(¢ + 7)]. To obtain 
nis average we need consider only four possibilities: 
+1, +1), (+1, -1), (—1, +1), and (—1, —1). Let the 


robability of each combination be P,,, P,_, ete., then 
Pa PRP Rr CrP. © 


y symmetry, P,, = P__ and P,_ = P_,. Furthermore 
ve sum of all four probabilities is unity. Then we conclude 
rat 


r(r) = 4P,, —1, (9) 


nd this relation is the last step in the required calculation. 
If we carry out the foregoing operations, we find for the 
orrelation, 


| in? 2 COSY — p 
| sin p+ a Gis ae 
a‘ (cosy — p)(2 — p cosy — p 
te ee ova) | (10) 


here g¢ = wr. In terms of R, given by (2), the expression 
seven simpler. We have 


= (2/rn) [sin R+ Gad ius ie 
(cosy = (2 — pose — 6’) +0@)]. (1) 


Vhen a’ vanishes we have the limiting case (4). Thus 
11) provides a correction to (4) when a’ is small. This 
orrection is of fourth order in a. To second order, (4) 
; still valid. 

When r(r) is an autocorrelation, p(0) = 1 and r(0) 
hould also be unity. This requirement is indeed satisfied, 
t least to the fourth order. The limiting form of (11) 
hen 7 — 0 appears indeterminate, but it can be evaluated 
rovided p’(0) = 0 and p’’(0) exists. These conditions are 
sually fulfilled for the power spectra which we might 
ssume for the noise. 

One application of the above result is to polarity- 
oincidence correlators, which can be used for signal 
etection.* 


4J. J. Faran, Jr. and R. Hills, Jr., “Correlators for Signal Recep- 
on,’ Harvard Univ., Acoustics Res. Lab. Tech. Memo. 27, p. 
8; 1952. 


R(r) 


BEFORE CLIPPING 


Fig. 1—The autocorrelation function R(r) of a sine wave plus 
Gaussian noise (before clipping) when the sine wave has a 
frequency w, the noise has an ideal low-pass band from 0 to 2a, 
and the signal-to-noise power ratio is a. 


r(r) 


AFTER CLIPPING 


Fig. 2—The autocorrelation function r(r) of a sine wave plus 
Gaussian noise after extreme clipping, when the input signal 
has the same spectrum as in Fig. 1. 


Examples of autocorrelation functions are shown in 
Fig. 1 and Fig. 2. Suppose the spectral density of the noise 
is constant between 0 and 2w and zero elsewhere. Then 
p(t) = sin 2wr/2wr, and this function is given by the 
solid curve in Fig. 1. The broken line in Fig. 1 shows the 
autocorrelation function R(r) after the addition of a sine 
wave with frequency w, when the signal-to-noise power 
ratio is a = 0.2. R(r) was computed from (2). Fig. 2 
shows the corresponding autocorrelation functions after 
extreme clipping, computed from (11). The functions after 
clipping are quite similar to those before clipping except 
for the occurrence of a triangular peak at 7 = 0. 


Za 
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A Note on Two Binary Signaling Alphabets 


DAVID SLEPIAN{ 


Summary—A generalization of Hamming’s single error correct- 
ing codes is given along with a simple maximum likelihood detection 
scheme. For small redundancy these alphabets are unexcelled. 
The Reed-Muller alphabets are described as parity check alphabets 
and a new detection scheme is presented for them. 


INTRODUCTION 


ERTAIN general properties of parity check 

@ alphabets for use on the binary symmetric channel 

were investigated in a recent paper by the author.’ 

In particular, it was shown that maximum likelihood 

detectors for such alphabets assume a particularly simple 

form. In addition certain optimal small-sized alphabets 
were presented. 

This paper describes two simple families of binary- 
signaling alphabets and detectors that are easy extensions 
of the material contained in that earlier paper.’ Both 
families consist of parity check alphabets and both 
contain alphabets of arbitrarily large size. Members of 
the first are designed for use in situations requiring little 
redundancy, and for certain ranges of the pertinent 
parameters these alphabets cannot be excelled. The 
rate for these alphabets approaches unity with increasing 
alphabet size, however, so that they cannot be used to 
approach the Shannon rate in nontrivial cases. It is not 
known whether members of the second family can be used 
to approach the Shannon rate. 

As in the reference cited," we consider communi- 
cation over a binary symmetric channel by means of 
parity check alphabets having & information places, p 
check places, and n = p + k places all told. That is, if 


Gd, +++ b,b. --- b, 1s a typical letter of the signaling 
alphabet, where the a’s and b’s are either zero or one, then 
a, --- a, can be chosen arbitrarily and 
i 
b; = DEcaa., 9, el a gee »?P (1) 
1 


where the dot denotes summation modulo 2. The kp 
entries of the matrix C = (c,;) determine the alphabet 
in question. 


GENERALIZED HAMMING ALPHABETS 
If p and k are chosen so that 


ea = > (?), 


i=o \? 


then a simple parity check alphabet and maximum likeli- 
hood detector can be described which cannot be excelled 


+ Bell Telephone Labs., Inc., Murray Hill, N.J. 
1). Slepian, “A class of binary signaling alphabets,” Bell Sys. 
Tech. J., vol. 35, pp. 203-234; January, 1956. 


by any other alphabet of 2” letters using n-digit binary 
sequences. That is, no other alphabet of the same size has- 
a smaller average probability of error per letter. These 
small redundancy alphabets are generalizations of 
Hamming’s original single error correcting code.” 

The C matrix for these alphabets is constructed as 
follows. The first row contains p ones. Successive rows of 
the matrix are obtained by writing in any order all the 


(*) p-place binary sequences with exactly 1 zero, then in 


P 
2 


2 zeros, etc., until k rows have been written. If c,; is the 
element in the 7th row and jth column of this kxp matrix, 
then the parity check rules are given by (1). 

The matrix C just described also serves as a code book 
for the maximum likelihood detector for this alphabet. | 
For each received (possibly erroneous) letter, form the 
parity check sequence fif. --- f, where | 


| 
any order all the ( ) p-place binary sequences with exactly 


k 
fia e; + De -cida © pn ones 
i=1 

and where d; is the binary symbol in the 7th information . 
place of the received letter and e; is the binary symbol 
in the jth check position of the received letter. If f,f.--- f, 
is the same as the rth row of C, the binary digit in the rth | 
information place of the received letter should be altered. 
If fifo --: f, is not listed among the rows of C and does 
not contain exactly three 1’s, the binary digits in those 
check places of the received letter should be changed that 
correspond to the places of fif, --- f, that have the 
binary digit 1. If the parity check sequence f, --- f, has 
exactly three 1’s and does not appear as a row of C, a row | 
of C having four 1’s is located such that three of these 
four 1’s are in the same places as the three 1’s of the parity — 
check sequence. Let this row of C be the ith row and let 
the 1 in this row that does not correspond to a one in the | 
parity check sequence be located in the jth column. 
Then in the received letter, the 7th information place and 
the jth check place should be altered. 

The alphabets and detectors just described correct 
2”—1 single errors if n > 2”—1. If 


3 
2 ee 


v 


the alphabets correct all (") single errors and in addition 


2”— | —n double errors. 


?>R. W. Hamming, “Error detecting and error correcting 
codes,” Bell Sys. Tech. J., vol. 29, pp. 147-160; April, 1950. 


: 
: 
(956 


Proof of the above statements follows readily from 
Slepian.* The description of the generalized Hamming 
alphabets presented here is simpler than that given by 
him. The detection rule has not been given before. 


REED-MULLER ALPHABETS 


The Reed-Muller alphabets* can also be described as 
parity check alphabets. Each letter has n = 2” binary 
places of which 


= ee 


i=0 \t 


are information places and p = n — k are check places. 
For each non-negative integer value of m and r < m 
there is one Reed-Muller alphabet. 

It is convenient to label the entries in the k information 
places of a typical letter of an alphabet by the symbols 
Bo, Ai, Az, ***, Amy Aro, Miz, ***, Aim—1)my A123, ***, A123...r) 

= * 5 @im—r+1) (m—r+2)...m- Lhat is, the a’s are labelled by 
all k combinations of the integers 1, 2, ---, m taken r or 
fewer at a time. Similarly, the check places, bio... (41). 
Dio. .r(r42)) °° * » Oie...m, are labelled by all 2” —k combi- 
nations of the integers 1, 2, , m taken r + 1 or more 
at a time. It will be convenient to denote a typical a or b 
with j subscripts by a4” or bY’ respectively and to write 
a C 8B for two subscripts a and @ if all the numerals in a 
are also present in 6. The special subscript o is to be 
considered as the empty set and is contained in every 8. 


—i- ') 
‘ 1s 
tsi) 


Let D‘,; be one or zero accordingly as (? - 


odd or even respectively. The parity check rules for the 
Reed-Muller alphabets can then be written 


Oy 
bs 


=; Deca. 


acp 


(2) 


7=r+1,r- 2, --- m, all possible £. 

Here all sums are modulo 2 and the a sum is over all 
indexes a that are i-fold combinations of the j integers 
that comprise 8; ¢.g., if B = 1346, the a sum is fori = 2, 
eee Go aie + Qi4 + Ane + O34 + A36 + O46. 

A maximum likelihood detector for the Reed-Muller 
alphabets can be constructed in a_ straightforward 
manner. No simplification in the detector has been found 
that depends on the special nature of the Reed-Muller 
alphabet. For large m and r the labor involved in con- 
structing the detector is considerable. 

An important feature of the Reed-Muller alphabets 
is that every two letters of an alphabet with parameters 
m and r differ in at least 2” " places. This implies that a 
detector can be built which will correct all errors in a sent 


3 Slepian, op. cit., paragraph 27. 

47. S. Reed, “‘A class of multiple error-correcting codes and the 
decoding scheme,” Trans. IRE, vol. IT-4, pp. 38-49; September, 
1954. 
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letter provided these errors be 1 = 2” "'—1 or fewer 
in number. Such a detector can be constructed as follows. 
First a code book is made which associates with each n 
place binary sequence 2,, 2:, --*, 212, °°, 23...» having 
! or fewer ones an (n — k)-place parity check sequence 


Jism ye ome Chef's areciven by, 
io =e ee (3) 

i=0 acp 

7=r+1,r+2,---,m, all possible B. 


The code book contains >> /~o (") pairs of entries. For each 


received (possibly erroneous) letter, an (n — k)-place 
parity check sequence is formed as in (3) where the digits 
of the received letter are substituted for the z’s. If the 
parity check sequence is not in the code book, the detector 
makes no decision as to the sent letter. If the parity 
check sequence is in the code book, the corresponding z 
entry having / or fewer ones is found. Those places of 
the received (possibly erroneous) letter are altered 
that correspond to places of the z entry that have the 
digit 1. 

It is not difficult to show that if the parity check 
sequence has | or fewer 1’s, the detector prescribes that 
no changes be made in the information places of the 
received letter. This results in a simplification of the 
detector when it is only desired to make corrections in the 
information places. The code book only needs to be 
constructed for those z sequences that have at least one 
lin their information places. 

The description of the Reed-Muller alphabets given 
here is new as is the detector described. In some cases 
this detector is simpler than that given.’ 

The parity check rules (2) for the Reed-Muller alphabets 
can be derived as follows. Consider the m-row by 2” 
column array of the components of the vectors 21, ®, °°, 
Xm as given by Reed’s.* The successive columns of this 
array are a listing of the integers from 0 to 2” —1 in 
usual binary notation. The columns of this array can 
be labelled with sets of integers or a-symbols as used 
in (2) of this note. In particular, the column having 
1’s in rows 2), 22, -- + 2; is labelled by the a symbol 2,22 - - - 2;. 
The column containing no 1’s is labelled by the special 
symbol o. 

Permute the columns of the array so that the labels 
are in the natural order 0, 1, 2, ---, m, 12, 13, ---, 123 --- 
m. Consider the rows of this array as vectors and denote 
the ith row by y;. Vectors whose subscripts are a symbols 
can now be formed from these y’s. If @ is the collection 
of integers 7,7,---7;, then the 6th component of y, is the 


product of the @th components of y;,, Yiz, °°*, Yi; The 
vector y, has all its components unity. 
The vectors Yo, Y1, °**, Yi2, °°°> Yr23, ete., correspond 


respectively to Reed’s vector I, 11, +++, Vi®o, +++, UX, 
etc. The y vectors have the distinguishing feature that 
the component of y, with label 8 is unity if and only if 


on & [ie 
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Reed’s’ (5) shows that the Reed-Muller alphabets are 
obtained by forming all possible linear combinations 
(modulo 2) of the first k y-vectors. The same alphabet 
will be obtained if all linear combinations are formed of a 
set of k vectors w, that are linearly independent combi- 
nations of the k y,.’s. The following is such a collection of 
vectors: 


Ds 


Boda 


Wa = (5) 
a composed of r or fewer integers, 
8 composed of r or fewer integers. 


From (4) and (5) it follows that the component of w, 
labelled w will be 1 if and only if there are an odd number 
of subscript symbols 8 such that a C 6 and 6 C u. If 
a ( yp, then, the uth component of w, will be zero. 

It is convenient to treat the case when a C yu in two 
parts. 

1) If wis comprised of 7 < r integers and a is comprised 
of 7 < 7 integers, then there are 2’-’ 6 symbols such that 
a C BC uy. Since 2’~‘ is even except when j = 1, this 
shows that the first k components of w, are zero with the 
single exception of the component labelled a@ which has 
the value 1. The w, are therefore linearly independent. 

2) If wis comprised of 7 > r integers and a is comprised 
of 7 < r integers, then there are 


ee) a oe 
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B symbols of r or fewer integers such that a C 6 C um. 


V Vv V 1 


it follows that 


or 


= (ashes y mod 2. 
r—1 
Therefore, when » is comprised of more than r integers, 
the component of w, labelled uw is 1 if and only if a C uw 
and ( Ss ) is odd. 
r—1 

Consider the array of n columns and k rows whose 
entry in the row labelled a and column labelled u is the 
component of w, labelled uw. From the foregoing it is seen 
that the first k rows and k columns of this array consist 
of the k X k unit matrix. The last n — k columns are linear 
combinations of the first k columns. Indeed, it is easily 
seen that the column labelled u is the mod 2 sum of those 
columns whose labels are the names of the rows that have 
a 1 in the column labelled yp. This statement, combined 
with the preceeding paragraph, establishes the parity 
check rules (2). 

The detector described for the Reed-Muller alphabets 
follows trivially from the results obtained by Slepian.* 


| 
: 
: 
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Generating a Gaussian Sample’ 


S. STEIN anp J. E. STORER{ 


Summary—tThe general theoretical difficulties in analyzing the 
effect of a random input signal on a known system are pointed 
out. Basically, if certain output statistics are computed directly, 
each statistic represents a complete, separate problem. An alter- 
native analytical computational procedure is suggested, using a 
Monte Carlo type technique in which the output is obtained by 
numerical integration from sequences of values which represent 
members of the statistical ensemble of the input process. For such 
applications, or for other possible uses such as in testing, it is 
necessary to generate statistical sequences, analogous to tables of 
random numbers. 

Techniques are discussed for analytically generating such 
sequences, to correspond to gaussian probability distributions 
which are further characterized by arbitrarily specified power 
spectra or autocorrelation functions. The procedure makes use of 
the standard tables of random numbers, these numbers being 
distributed uniformly and without correlation. The exact statistical 
generation of N values of a sequence is shown to require, in general, 
the diagonalization (or solution for the eigenvalues and eigen- 
vectors) of an Nth order matrix; two simpler approximate procedures 
are also described. 


INTRODUCTION 


N THE COURSE of a recent investigation, the 
| problem arose of determining the effect of a non- 

linear system upon a signal which could only be 
specified statistically. The input-output relationship was 
resolved into the form of an integral, where however, the 
integrand was not linearly related to the random input 
process. Therefore, despite the fact that the input process 
was gaussian and random, it was impossible to obtain 
predictions on the probability distribution of the output. 
It was found impossible, moreover, to compute analy- 
tically even the simpler moments of the output. Hence 
no information was available on an analytical basis. 

As a consequent attempt to resolve the problem, 
numerical integration was considered for the multiple 
integrals which described the moments. These compu- 
tations were, however, estimated to be much too lengthy 
and time consuming, even for setting up on a large scale 
computer. Nevertheless, the concept of numerical inte- 
gration did serve to bring forth another idea, which is 
somewhat similar in philosophy to the Monte Carlo 
technique. Namely it was suggested that the original 
integral relating the input and output process should be 
used, and that a numerical integration should be carried 
out in which the input process is represented by its sample 
values at the appropriate intervals. By carrying out the 
integration for many values of the independent variable 
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(usually time) of the output process, successive sample 
values of the output process would be obtained. The 
latter sequence or several such sequences could then be 
analyzed statistically, to obtain a statistical picture of 
the output, even including probability distributions. 
Obviously a simple procedure of the same sort could be 
used to obtain a description of a random process y/(f), 
related to some other known random process x(t), by any - 
type of functional relation 


y(t) = Flx(t)] (1) 


where F' in general may not be a linear process. The only 
limitations on these procedures are that 1) sufficiently 
short sampling intervals must be used, 7.e., intervals 
appreciably shorter than the correlation time of any 
process involved; 2) the length of the sequences obtained, 
or the number of sequences calculated out, must be 
sufficient to give reliable statistics (the rms error generally 
varying inversely as the square root of the number of 
values taken in the averaging); and 3) it is necessary to 
know a sequence of sampled values of the input process, 
which obey the proper statistics. In general, however, 
the statistics are the known quantity, while sample 
sequences of the process are not necessarily available. 
While an experimental approach to obtaining such 
sequences may sometimes be feasible, it is not always 
possible. Therefore, it becomes desirable to consider 
whether a set of random values could be generated 
analytically to satisfy given statistics. Such sets of values 
could obviously find application in fields far removed 
from the solution of functional relationships, for example 
in such fields as applied psychology. Since most random 
processes originate as gaussian processes, and any further 
modification of the members of the ensemble can be taken 
into account by direct computation, the investigation 
was confined to a study of gaussian processes, which are 
characterized completely in terms of their autocorrelation 
function or power spectrum. The feasibility of analytical 
generation of sample values to correspond to given gaussian 
statistics is the subject of the remainder of this paper. 


GENERATING AN UNCORRELATED (‘“WuitE NOISE’’) 
GaAussIAN RANDOM PROCESS 

An uncorrelated gaussian process is completely described 
by its first-order probability distribution 

1 Cae 
o V 20 
where & 0 has been assumed (there will be no loss in 
generality). It is desired to choose successive, independent 


W(x) = (2) 
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values of x, conforming to this distribution for a given 
value of o. 

It is first necessary to consider the property of random- 
ness. It is apparent, after a little thought, that probably 
the only reasonable way to incorporate this property is 
to somehow make use of a sequence of purely random 
numbers, such as are available in random number tables. 
The successive values of x must then be related to the 
successive values of another quantity, r, chosen from a 
sequence of random numbers. The sequence of values, 7, 
can be considered to be a random stationary process, 
with a distribution uniform over (0, 1). Fortunately, the 
required transformation is simple, and well-known; it is 
obtained readily from the requirement that W,(r) dr 
Wile) dé, where) —o xy <7 oS area Ie 

The appropriate solution to this differential equation 
is readily shown to be given in terms of the error-function 


as 
i x 
r= ae a erf (2). 


For each value of r selected from a random number 
sequence, it is merely necessary to invert this relationship 
in order to obtain a set of values of x which conform, 
statistically, to the required distribution [3]. 


(3) 


SHAPING BY A FILTER 


The next problem is to extend the procedure so as to 
generate sequences of correlated numbers, 7.e., sample 
values from a gaussian random process having a power 
spectrum of finite width. Perhaps the first thought which 
might occur to those in electronics is to accomplish this 
as the analytical equivalent of passing the signal through 
a shaping filter, having the desired spectrum. 

Thus, with the set of values of the white noise given as 
x,, a set of values y, would be computed at intervals of 
7, by the formula 


this = MS h(kr — jr)x;. (4) 
Here h(t) is the filter response function. It should be 
noted that (4) is mathematically only a linear trans- 
formation. This thought will be considered again shortly, 
but it should be recalled here that any such transformation 
preserves the stationary gaussian random character of 
the process. If the coefficients are interpreted in terms of 
the filter concept, where h(t) 1s defined to be zero when 
t < 0, the summation will actually have a finite upper 
index, at 7 = k. In addition, there will always be a finite 
lower limit to the summation, corresponding to the fact 
that only a finite number of terms will be summed. It 
would seem intuitively desirable, and appears to lead to 
desirable effects, if the same number of x; values are used 
in computing each y,. Thus, summation over N + 1 
terms is properly written as 


. 
he Sy h(kr — jr)x;. (5) 
j=k—N 
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There is, of course, a natural question as to the accuracy 
which may be obtained in obtaining values for the 
y-process by this method. A preliminary error analysis, 
and some sample computations which have been made, 
seem to show that it is not an unreasonable procedure; it 
is particularly attractive because of the computational 
ease as compared with the more exact procedure to be 
described next. 


Tur MvuuLtripIMENSIONAL GAUSSIAN PROCESS 


There is a more mathematical manner of looking at the 
problem of generating samples to fit a gaussian process 
with a given autocorrelation function. This is to consider 
that if N sample values are required, then the Nth order 
joint distribution density of this set of N samples, averaged 
over'an ensemble of such sets, is specified by the corre- 
sponding Nth order gaussian probability distribution. 
That is, the y, are now to be generated randomly so that 
they satisfy 


1 1 
(2n)*” | w | 


Leys te he 
c exp = a (u "eattny (6) 


Here, the y;’s are assumed to occur at times ¢;, and the 
desired autocorrelation function, R(t; — ¢t;) = Yj, 1s 
contained implicitly in the N X WN _ autocorrelation 
matrix, uw, defined by 


(u)i; = R(t, — 4), 


with | » | the determinant of the matrix, and yp * 
inverse. 

It is immediately apparent that one way to generate 
such a process, using information already available, is to 
seek a linear transformation connecting the y-process to an 
x-process in which successive values are uncorrelated [1, 2]. 
This is mathematically completely equivalent to seeking 
the transformation which diagonalizes the quadratic 
form in (6), a technique well known in matrix algebra. 
With this transformation, the x-process obtained will 
satisfy an Nth order joint probability distribution in 
the form 


WwW +> Yx) = 


1/2 


(7) 


its 


N 2 
Wy(a, --- tw) = Const. exp = Ss a (8) 


k=1 Th 
where each of the x, can thus be considered as statistically 
independent, and hence can be generated by the procedure 
introduced earlier in the second section. 

The diagonalization of the « matrix, it will be recalled, 
is equivalent to finding the eigenvalues and eigenvectors 
of the matrix y, and will yield a result such that (P‘uP);; = 
\;6,;, Where \; are the eigenvalues of the » matrix, and 
the diagonalizing P may be constructed by using the 
successive normalized eigenvectors of » as columns. The 
appropriate transformation connecting the y-process to 
the x-process can be simply shown then to be 


y = pPx (9) 


: 
1956 


or alternatively 


y = Paz, (10) 


vhere y and x are column vectors containing the N values 
of each sequence, and A is the matrix of eigenvalues of u, 
such that 


CNeGootie: (11) 


Then, with the transformation so constructed, the 
listribution of the x; is given to within scale factors by 


1 N 
exp _ Se wa 
t=1 


These x; are seen to be indeed independent of each other 
und gaussianly distributed, and hence may be chosen 
simply as before by choosing a set of random numbers 7; 
rom a random number table, and inverting the relations 


—— ak + erf («.,/%) | 


With the x; thus selected, the corresponding y; are now 
obtained by the simple matrix multiplication (9) or (10). 

Since the matrix maultiplication is simple, once P and 
A are found, the basic difficulty of the technique proposed 
has been concentrated on finding the eigenvalues and 
sigenvectors for the given autocorrelation matrix yw. This 
may be a formidable computational problem, and is 
discussed further in a later section. 

One interesting point may be noted from (12). Namely, 
although the y;, are considered as a set of successive points 
from a stationary gaussian process, the x, can no longer be 
so considered, because of the differences in the effective 
Jeviation (o, = 1*/2,) associated with each x,. Another 
way of looking at this is that the procedure has corrected 
for the use of only a finite number of terms in the x 
sequence, by changing the spread associated with some of 
these. This modification of the uncorrelated process is 
therefore no longer a simple modification of a white noise 
process, but actually more subtly uses another process 
to obtain the desired result. The necessity for doing this 
to get the desired exact result undoubtedly accounts, at 
least in part, for the error inherent in the shaping filter 
concept. 


(12) 


(13) 


User or CONDITIONAL PROBABILITIES 


Still another alternative procedure is based on the use 
of conditional probabilities, and would be particularly 
useful if the desired process were governed by a low-order 
distribution. Thus, if yn, -°+, Ym+s-1 have already been 
chosen, then y,,., is given in terms of the conditional 
joint probability by 


W,, Gee rie te Um ue) f 
Ps (2 m+k Yon ares Ym a) Poss a == : ae (14) 
a Y : | Y : ; : ; W bas ore tee) 
Thus, if Yn, °-*, Ym+x-1 have been selected, this gives a 


probability for Yn4% as 


Wy ) = Ci Se tans iad Ke 2am" 
m+k} — a 2 


(15) 
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Thus a value can be chosen for y;,,,, consistent with the 
statistics and the previous k values, by using the same 
error-function relation as before, with (y,,., — 8) as the 
random variable. It should be noted, however, that the 
evaluation of the coefficients A and B above will require 
the knowledge of u* where u is both of kth and of (k + 1)th 
order. This process can be really convenient then, when the 
process is assumed governed by a kth order distribution, 


with k low, where any new point need only be related to 


k — 1 previous points. In such a case, the computations 
would be considerably shortened by using this procedure. 
Actually, for a kth order governing distribution, the first 
k points could be obtained simultaneously by the matrix 
diagonalization described earlier, and succeeding points 
by the procedure described here. Then only two matrix 
inversions, of a kth and (k + 1)th order matrix, would be 
necessary, because the same matrix coefficients will occur 
each time. The “low” value mentioned for k need not be 
severely low, but would be governed probably by the 
availability of convenient computer programs for handling 
the matrices involved. In fact the main advantage to this 
procedure is the fact that if only matrices of a certain low 
order need be dealt with, it might be possible to make 
use of computer programs already written out for rapid 
solution of such matrices. It should however be pointed 
out that the procedure described above will give exact 
results, only if the process is actually governed by its kth 
order probability distribution density. 


DISCUSSION OF THE AUTOCORRELATION MATRIX 


It is apparent that, in general, diagonalization of the 
matrix » would have to be solved by use of a computing 
machine, and it will be of primary concern as to what 
order, N, of matrix can be dealt with accurately on a 
machine. This will be a major problem in obtaining the 
desired sequences, particularly limiting the length of the 
sequences. However, it should be emphasized that for any 
given power spectrum or autocorrelation function, such 
sequences would have to be computed only once, and 
would then be available for use in whatever problems the 
statistics arose. 

In addition, it should be added here that it is possible 
to achieve a completely analytic diagonalization in at 
least one case of major interest, namely, the gaussian 


0 2 162i] (2) 
Markov process. Here, assuming o = 1, R(f) = e “ 
: —a|(i-7)tl a 
and the matrix elements are u;; =e ” (16) 
: 6 . : —a|t| Oye 
where ¢ is the sampling interval, with p = e ay bes 


readily shown that the eigenvalues are of the form 


eye _ 
= = = 7 
. ley Pete COSO, ae 


where ¢, are the successive solutions of the transcendental 
equation 


tan No = aa (18) 


| 2 
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The corresponding result for the eigenvectors is that 
the kth component of the jth normalized eigenvector is 
given by 


seat yra( sin fen 
a A; sin ¢; 


2 
N(1-+ =p’ 4 2p%cos };) (Ll 80) 


-[p sin (k — 1)¢; + sin kd, ] 
(ea oe 


DE (19) 


It can easily be shown from the results for \,; and ¢,; that 


__| sin N¢; | 


/ | sin ¢; AD) 


and hence the factor 1/\; (sin N@;)/(sin ¢;) is simply a 
+1 factor, depending on the value of N@;. 

Thus, in the case of a Markov process the stated problem 
is completely solved. The precision is limited only to the 
precision which it is desired to use in solving the trans- 
cendental equation for ¢, and the precision to which 
subsequent simple arithmetic processes of multiplication 
and division are carried out. 

It is interesting to note the clustering of the eigenvalues 
about ’ = 1. It is easily verified by checking back, that 
if o° ¥ 1 is used, the values obtained for the eigenvalues 
are just multiplied by o°, while the eigenvectors remain 
the same. This clustering is to be expected, since the sum 
of the eigenvalues is equal to the trace of the autocorre- 
lation matrix. Since all elements of the diagonal are equal 
to o, this sum is No’, and hence the average value of the 
N eigenvalues is always equal to o”. This has unfortunate 
implications for the use of a large scale computer to deal 
with diagonalization of a more arbitrary autocorrelation 
matrix, since the order of the matrix which can be handled 
within a given accuracy may be more severely limited 
when the eigenvalues are clustered in this fashion, than if 
they were spread. 


June 
CONCLUSION 


The primary purpose of this paper has been to focus 
attention on a possible method for dealing numerically 
with a class of problems involving stochastic processes, 
and to discuss the associated question of how to obtain 
stochastic samples by computational or analytical means. 
Should the concepts be of sufficient value, it should be 
possible to compute out tables of sample sequences once 
and for all, for use in these procedures. 

It has been shown that a sample sequence of N values — 
of a gaussian process can be generated exactly by diagonali- 
zation of an Nth order matrix, for example, by machine 
computation. By use of a general symmetry property for 
the eigenvectors of such a matrix, it is possible to reduce 
the order of the matrices to be dealt with by half, although 
this also loses much of the symmetry of the original matrix. 
Two possible approximate generating procedures were 
also suggested, neither of which, however, will give 
exactly the desired statistics. Although in the latter cases 
it would be improper to apply the resulting sequences 
without an analysis of the likely error, the procedures 
may well be useful for generating long sequences, even 
though only approximate. Basically, they would obviate — 
the limitations in sequence length caused by the lmi- 
tation on the order of a matrix which can be handled on 
a digital computer without large errors. 

The other possibility, analytical diagonalization of the 
Nth order autocorrelation matrix, may be possible in 
special cases. Results of this type have been given here 
for the simple gaussian Markov process, which is the 
only process which has seemed amenable in this manner 
to date. 
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On the Information Invariant® 


SATIO OKADAt 


The contents of the book ‘‘Cybernetics,” by N. Wiener, 
s far more inclusive than conventional electrical engineer- 
ng or mathematics. As the title of this paper suggests, 
ts intention is not to treat a part of so-called ‘cybernetics’ 
out to specify a mathematical invariant of information, 
Which will enable us to treat all problems of the field from 
1 unified but nonstatistical standpoint. 

As the first step, an ordinary definition of information 
quantity and its application are given in the first part. 
This is a brief introduction of a previous paper by the 
author.’ 

At first, the essential difference between transmissions 
of energy and information is explained by an example of 
‘common battery” telephone system. Then the quantity 
of information is defined by quantization of continuous 
change from simplest telegraph to solid color television. 
And in all cases, there holds 


loge C= : FT loge; 
where 
C = number of permutation, 
x = factor depending on transient, discriminating 
ability, accuracy of intensity of signal, 
F = frequency band, 
T = communicating time, 
e; = number of kinds of intensity. 


Physical and information theoretic meaning of x is dis- 
cussed in detail, especially in reference to absolute delay 
of transmission and necessity of history caused by re- 
duction of x. 

As its applications, the following are given: 


1) Importance of time-division multiplex telephone 
system. 


*{Bull. of Yamagata University, Eng., Vol. 3, pp. 95-111; May, 

1954. Copies are obtainable directly from the author. 

(| Microwave Res. Inst., 55 Johnson St., Brooklyn 1, N.Y. 

18. Okada and Sakae Fujiki, ‘The intrinsic substance and 
metrization of communication,” J. Jap. Inst. of Elec. Commun. 
Eingrg., no. 204, pp. 147-157; March, 1940. (In Japanese.) English 
ubstract: Nippon Elec. Commun. Engrg., no. 21, pp. 64-65; July, 
1940. 


2) Principle of secret telephone by diluting with camou- 
flage information. 

3) Practical limit of compander and expander by 
quantization principle. 

4) Axiomatic possibility of substantial compression of 
band by a model in thought based on acoustic 
spectrum by Roland concave grating. 

5) High information capacity of high-frequency phe- 
nomena (not only electromagnetic but also gravi- 
tational, if possible). 

6) Possibility of filtering and separating of electric 
waved by prism and grating. 

7) Importance of flip-flop (Kallitrotron) as 
information element. 

8) Conversion between audition and vision, ete. 


high 


However, if we give a glance at any communication 
method, we find that most communicating apparatuses 
were invented by appropriate reduction of original infor- 
mation quantity. It means that adequate dispatching in 
information quantity enables us to invent a new communi- 
cation device. Also, the information quantity of the silent 
worship of Catholics, Quakers, or “Zazen”’ of Japanese 
Buddhism, is a little difficult to determine. Then what is 
kept invariant during these information reducing pro- 
cesses? The previous example represents extremely poor 
information from the standpoint of the present information 
theory. 

Proposals and suggestions as to an answer of this 
problem form the second part. At first, “contents” of two 
informations are defined to be equivalent, if both cause 
identical phenomenon which is independent of energy 
consumed for these informations. It can satisfy three 
axioms of equivalence: Reflexive, symmetric, and transi- 
tive. Also “inclusion” of lattice theory is defined by caused 
phenomena or behaviors. When lattice theory is applied, 
atoms of elements in ‘Hasse diagram” may be regarded 
as absolute information units. The adjective ‘‘semantic”’ 
in the information theory seems to be an investigation of 
logical construction of contents, but it can be ultimately 
regarded as correspondence with each part of behaviors 
or phenomena. 
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Corre spondence 


‘In Which Fields Do We Graze?’’* 


I was very much interested in the above 
editorial because of my own views on 
the generality of the theory of information 
and also my interest as a former employee 
of the Federal Telecommunication Labora- 
tories in the lead that company is taking 
in the administration of the PGIT. 

The argument that the applications of 
information theory to other fields be left 
to specialists in those other fields is further 
evidence of the parochial attitude of 
scientists who forget that their field began 
as the investigation of all knowledge. 
From a study of the principles of infor- 
mation theory and the understanding of 
what constitutes information, a much 
larger view of all human endeavor unfolds, 
so that I realize that although other 
fields such as management, biology, psy- 
chology, business relationships, law, and 
so on, do often reach the same conclusions, 
as are analytically determined in our field, 
they do it by such a laborious method that 
their handicap is evident. (I include several 
ancillary fields such as game theory and 
symbolic logic as part of the science of 
information.) 

Several examples may illustrate: Prof. 
Lillian Lieber of Brooklyn Polytechnic 
Institute and St. John’s University felt 
that the entire Aristotelian syllogism which 
occupied philosophers for several thousand 
years could be condensed to three relation- 
ships of Boolean Algebra (symbolic logic). 
She also indicated that all American legal 
principles, when analyzed by mathematical 
logic, may result in such discovery of 
inconsistencies logical as to render a good 
part void. 

The work done at the University of 
Illinois biology laboratories has indicated 
that the redundancy principle is not 
unknown in nature. Many human functions 
are carried out by a redundant use of 
nerves and muscles. This accounts for the 
extraordinarily high “accuracy of trans- 
mission” if we merely substitute a few 
biological terms for “accuracy” and “‘trans- 
mission.’”’ Perhaps ‘‘successful functioning”’ 
would do for the former, and “process” 
such as birth, grasp, lift, walk, etc. for the 
latter. I was particularly struck by the 
similarity when studying the operation of 
the Naval Ordnance Research Calculator 
formerly at Columbia University. There 
was this huge machine (the brain), being 
instrumented (a logic), fed (power) cooled 
to a constant temperature (not too far from 
98.6 degrees F), and defective plug-in 
assemblies replaced on a continuous in- 
spection basis which was so close to cellular 
replacement in the animal organism as to 
be startling! The chief engineer on the 


*Trans. IRE, vol. IT-1, p. 2; December, 1955. 


project had determined (empirically) that 
the best technique was to inspect regularly 
and replace only those plug-in assemblies 
which were going bad (due to loss of G,, 
in their tubes etc.) restoring those units 
which were good although some of these 
had been in operation much longer than 
statistical averages. This is precisely what 
occurs in living cells! The use of code 
checking and redundancy in the computer 
is analogous to the redundancy of the 
organism. A familiarity with information 
theory by the biologist could help him 
predict how many original cells are 
necessary to insure a satisfactory biological 
function with a given percentage of “‘trans- 
mission accuracy,” that is, “successful 
functioning.” In particular, I was thinking 
of the reason why nature may require 
millions of male cells per successful fertili- 
zation of an ovum when only one is needed 
to “transmit the information.’’ Could this 
account for the minute percentage of 
imperfect, deformed babies at birth? 

In business the applications of infor- 
mation theory are so numerous that I feel 
that those businessmen who are ignorant 
of it through not being communication 
engineers or who have not learned the 
principles naturally are at a distinct handi- 
cap. Statements such as, ‘“‘We expect to pay 
a high price for it,” do not convey infor- 
mation, and so I personally do not act upon 
them. The statement, ‘‘We expect to pay 
a higher price than last time,’ conveys a 
measurable amount of information which 
can be expressed in bits; and the statement, 
“We expect to pay no less than $8 nor 
more than $10,” can even be instrumented 
on a computer together with other contract 
details. 

The last example is one enabling me to 
solve many engineering problems by an 
extension of the elements of information 
from one field to another. This device is 
familiar to engineers who use analogs in the 
solution of problems, but the more extensive 
generalization, the concept of the propo- 
sitional function of Bertrand Russell, and 
the algebra of classes, permit a bolder use 
of analogs retaining only those elements 
which convey information. 

Shannon’s theory is admittedly limited 
to the engineering aspects of communi- 
cation and does not include what he calls 
the semantic aspects which he considers 
irrelevant to the engineering problem. 
However, an investigation into the rami- 
fications of the science, such as, for ex- 
ample, music (what is it about the 
transmitter, the channel, and the receiver, 
which conveys information?) must of ne- 
cessity include almost all other scientific 
disciplines. 

This lengthy letter is prompted by the 
desire to do something to counteract the 
narrowing of that branch of our science, 
information theory, which seems to me to 


June 


hold promise of providing the key to what 

makes almost everything tick, and not to 

restrict it merely to radio and wire com- 
munication. 

Max HoBpeRMAN 

Bergen Laboratories 

Fair Lawn, N. J. 


You bring up a very interesting question 
in your editorial. 

Having attempted to apply Information 
Theory to a field somewhat removed from 
radio, I might have a vested interest in 
claiming that our field is the universe. 
Seriously however, I believe that since 
Information Theory has been developed 
as a new tool by our group of engineers, we 
can be considered duty bound to utilize the 
tool in every way that it may be of benefit. 
This is especially true, since our group 
includes the only nucleus of the craftsmen 
capable of utilizing Information Theory and 
evaluating such utilization. Thus any 
member attempting to extend the scope of. 
the theory, can only do so properly within 
out limited group. No other place exists. | 
For that reason, we have as little right to 
disown our products as to disown our 
physical offsprings. 

Until other groups are formed into which 
we can extend its scope, it seems to me we 
are duty bound to follow every rich vein 
into whatever field it leads. Such a course 
of action would be nothing new for the IRE. 
The IRE has grown to include every group 
that used the tools initially developed for 
the radio industry. Thus, for instance, the 
electronic computer group, industrial elec- 
tronics group, instrumentation group, ultra- 
sonic engineering group, medical electronics 
group, and substantial portions of the other 
groups have “grazed into fields” that were 
far removed from the original concept of 
radio. The editor of the ProckEDINGS OF 
THE IRE comments on that fact in the 
February, 1956 issue (page 151) and points 
out that a minority of the members are 
actually engaged in radio. Thus, we have 
both logic and precedent on the side of not 
limiting the investigations of the Infor- 
mation Theory Group. 

SAMUEL BaGno 
Chief Eng., Ultrasonic Div. 
Walter Kidde and Co., Ine. 
Belleville, N. J. 


I am writing in response to the question 
“Tn Which Fields Do We Graze?” It is my 
feeling that the PGIT should be interested 
in the broadest possible range of appli- 
cations of information theory and com- 
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nication theory, both because the further 
velopment and application of these fields 
ll thereby be promoted, and because the 
st and most interested audience for papers 

such work is to be found among the 
x31T. However, care should be exercised 
} avoid bringing papers to the attention of 
© PGIT merely for their novelty; they 
ould represent good information or com- 
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munication theory. New applications should 
also be brought to the attention of people 
within the specialized field of the new appli- 
cations, but the type of presentation required 
there will be quite different from what would 
suit the PGIT. 

The problem would be simplified by 
defining “Information Theory” as the 
study of measures of information, regardless 


TG 
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